THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


THE OFFICIAL JOURNAL OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 


VOLUME XII 





THE ANNALS 
MATHEMATICAL STATISTICS 


EDITED BY 
S. S. WILKS, Editor 
A. T. CRAIG J. NEYMAN 


WITH THE COOPERATION OF 


H. C. CARVER R. A. FIsHER R. pE MisEs 

H. CRAMER T. C. Fry E. S. PEARSON 
W. E. Deminea H. Hore ine H. L. Rrerz 

G. DarMoIs W. A. SHEWHART 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTICS 
should be sent to 8. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot- 
notes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $4.00 per year. Single copies $1.25. 
Back numbers are available at the following rates: 


Vols. I-IV $5.00 each. Single numbers $1.50. 
Vols. V to date $4.00 each. Single numbers $1.25. 


Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to A. T. Craig, University of lowa, lowa City, Iowa. 


The ANNALS OF MATHEMATICAL Statistics is published quarterly by the 
Institute of Mathematical Statistics. 


COMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
BALTIMORE, Mp., U. S. A. 








i 











THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


Tue OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 
sifgley moter dm Powerful Tests of Statistical Hypotheses. 


Experimental Determination of the Maximum of a Function. 
. Haroip Hore..ine 


On a Statistical Problem Arising in Routine Analyses and in 
Sampling Inspections of Mass Production. J. NeYMAN 


A Concise Analysis of Certain Algebraic Forms. Franxuin E. 


A Symmetric Method of Obtaining Unbiased Estimates and Ex- 
pected Values. Pavun L. Dressen 


en of Sample Sizes for Setting Tolerance Limits. 8. 8. 
~ 


On a Certain Class of Orthogonal Polynomials. Franx 8. Beare. 
The Skewness of the Residuals in Linear Regression Theory. P. 8. 


Notes: 


Note on the Adjustment of Observations. Arraur J. Kavanaan.... 111 


The Estimation of a Quotient When the Denominator is Meese 
Distributed. Roszrt D. Gorpon 


Note on Confidence Limits for Continuous Distribution Functions. 
A. Waup anv J. Wo.Frow1Tz 


Report of the Chicago Meeting of the Institute 
Abstracts of Papers 


Vol. XII, No. 1 — March, 1941 


115 





THE ANNALS 
‘OF MATHEMATICAL STATISTICS 


EDITED BY 
8. S. WILKS, Editor 


A. T. CRAIG J. NEYMAN 


WITH THE COOPERATION OF — 


H. C. Carver R. A. Fisher R. pE MIsEs 

H. Cramtr T. C. Fry E. 8. PEARSON 
W. E. Demine H. Hore.iine H. L. Rrerz 

G. Darmois W. A. SHEWHART 


The Annats oF Matuematicau Sratistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves.; Baltimore, 
Md. Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS OF MaTHEMATICAL Statistics, Mt. 
Royal & Guilford Aves., Baltimore, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, E. G. Olds, Carnegie Institute of Technology, 
Pittsburgh, Pa. : 


Manuscripts for publication in the ANNALS OF MaTHEMATICAL STATISTICS 
should be sent to 8. 8. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts 
should be typewritten double-spaced with wide margins, and the original copy 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot- 
notes should be avoided. Figures, charts, and diagrams ; hould be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


‘Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $4.00 per year. Single copies $1.25. 
Back numbers are available at the following rates: 


Vols. I-IV $5.00 each. Single numbers $1.50. 
Vols. V to date $4.00 each. Single numbers $1.25. 


CoMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
Bautimore, Mp., U.S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the Act of March 3, 1879 © 











ASYMPTOTICALLY MOST POWERFUL TESTS OF STATISTICAL 
HYPOTHESES’ 


By ABRAHAM WALD? 


Columbia University, New York City 


1. Introduction. Let f(x, @) be the probability density function of a variate 
x involving an unknown parameter 6. For testing the hypothesis @ = 4 by 
means of n independent observations 2 , --- , Z, on x we have to choose a region 
of rejection W,, in the n-dimensional sample space. Denote by P(W, | 0) the 
probability that the sample point FE = (2, --- , tn) will fall in W, under the 
assumption that @ is the true value of the parameter. For any region U, of 
the n-dimensional sample space denote by g(U,) the greatest lower bound of 
P(U, | 6). For any pair of regions U, and T, denote by L(U,, T;) the least 
upper bound of 


P(U,| 0) — P(T,| 4). 


In all that follows we shall denote a region of the n-dimensional sample space 
by a capital letter with the subscript n: . 

Definition 1. A sequence {W,}, (n = 1, 2, --- , ad inf.), of regions is said to 
be an asymptotically most powerful test of the hypothesis 6 = 4 on the Jevel of 
significance a if P(W,,| %) = a and if for any sequence {Z,} of regicns for 
which P(Z,, | 0) = a, the inequality 

lim sup L(Z,,W,) <0 
holds. 

Definition 2. A sequence {W,}, (n = 1, 2, ---, ad inf.), of regions is said 
to be an asymptotically most powerful unbiased test of the hypothesis @ = 4% 
on the level of significance a if P(W,, | 6) = lim g(W.) = a, and if for any se- 


quence {Z,} of regions for which P(Z, | 6) = lim g(Z,) = a, the inequality 


* lim sup L(Z,,W,) < 0 


holds. 
Let 6,(21, «++ , 2n) be the maximum likelihood estimate of @ in the n-dimen- 
sional sample space. That is to say, 6,(21, +--+, 2n) denotes the value of 6 


1 Presented to the American Mathematical Society at New York, February 24, 1940. 
2 Research under a grant-in-aid from the Carnegie Corporation of New York. 


1 














2 ABRAHAM WALD 


for which the product [] f(z, , @) becomes a maximum. Let W,, be the region 
v=] 


defined by the inequality ~/n(6, — 6) > c, , W. defined by the inequality 
/n(6n, — %) < cn, and let W, consists of all points for which at least one of 
the inequalities . 


V/ n(n a. 60) = Qn, /n(bn = 6) < — An 
is satisfied. The constants a, ,c, , c, are chosen such that 
P(W1, | 0) = P(W%. | 6) = P(Wa| 4%) = a. 


It will be shown in this paper that under certain restrictions on the probability 
density f(z, @) the sequence {W,,} is an asymptotically most, powerful test of the 
hypothesis @ = 6 if @ takes only values @ > 6. Similarly {W;,} is an asymp- 
totically most powerful test if @ takes only values 8 < 6. Finally {W,} is an 
asymptotically most powerful unbiased test if @ can take any real value. 

2. Assumptions on the density function f(z, @). 

ASSUMPTION 1. For any positive k 

lim P(—k < 6, —0 <k|6) =1 

uniformly in 0, where P(—k < 6, — 0 < k| @) denotes the probability that —k < 
6, — 0 < k under the assumption that 6 is the true value of the parameter. 

Assumption 1 implies somewhat more than consistency of the maximum like- 
lihood estimate 6,. In fact, consistency means only that for any positive k 

lim P(—k < 6,-—0 < k|6) = 1, 

without asking that the convergence should be uniform in 6. If 6, satisfies 
Assumption 1 we shall say that 6, is a uniformly consistent estimate of 6. A 
rigorous proof of the consistency of 6, (under certain restrictions on f(z, @)) 
was given by J. L. Doob.’ In an appendix to this paper it will be shown that 
under certain conditions 6, is uniformly consistent. 

Denote by E,[y(x)] the expected value of ¥(x) under the assumption that @ 
is the true value of the parameter. That is to say, 


Exlv(e)] = [_ v(wsla, 0) de. 


For any 2, for any positive 6, and for any 6, , denote by ¢;(z, 0, , 6) the greatest 
2 
lower bound, and by ¢2(z, @:, 5) the least upper bound of eo in the 


interval 6; — 6 < @< 6, + 6. 
ASSUMPTION 2. There exists a positive value ko such that the expectations 
Esgi(xz, 0,6) and Eg yo(x, 61, 5) exist and are continuous functions of 0, 0, and 6 


3 J. L. Doob, ‘Probability and statistics,’’ Trans. Am. Math. Soc., Vol. 36 (1937). 


eee 


LE LT TL OT 
a were on 


a 


et pen RR ER ER OP RS FE WERE EROT EEE ee 


| 
| 
| 
| 








TESTS OF STATISTICAL HYPOTHESES 


an the domain D defined by the inequalities: 0 < 6 < 4ko, 0 — 4k < 6 < 
60 + 4ko, 00 — ko < 0< +h). Furthermore the expectations Esly:(x, 6: , 6)]° 
and Ey|go(x, 6: , 6)|’ exist in D and have a finite upper bound in D. 

AssumPTION 3. There exists a positive value ky such that 


af(z, ) 4 — * 10) ag =0 for &—k <0<%+ho. 


Lc 06 00 


Assumption 3 means simply that we may differentiate with respect to @ under 
the integral sign. In fact 


[ fx, @)dx = 1 
identically in 6. Hence 
0 eo 
pr [1 6) dx = 


Differentiating under the integral sign, we obtain the relations in Assumption 3. 
AssuMPTION 4. There exists a, positive n and a positive ko such that 


2+n 


af f(z, 0) dx = 0. 





Es | "96 
exists and has a finite upper bound in the interval 0 —*ky © 0< OO +h. 

3. Some propositions. Denote »/n (0, — 0) byez,(@) and dencte the proba- 
bility P[z,(@) < t| 6] by ®,(¢, 6). 

Proposition I. Within the 6-interval [0 — 3ko, % + 3ko] ®a(t, 0) converges 
with n — © untformly in t and 6 towards the cumulative normal distribution with 
zero mean and variance 








-1/ a a log f(z, 8) 6) 
— OGe 


Proor: In all that follows we assume that @ takes only values in the interval 
[80 — ko, 0% + ko], except when the contrary is explicitly stated. . Furthermore 
we introduce the variable @, and assume that 6, takes only values in the interval 
[00 — 3ko, 0 + Fkol. 

Because of Assumption 3 we have 


(1) Ey a eo a. afte, 0) — ®) ar = 0 


Since 











@ log f(z,0)_ 1 a f(z,o) 1 B= oT 
a? = f(z, 0)? FCA) a 


we get from Assumption 3 


(2) E, 2 log f(z, oy _E, Fs + Stn 6) 


00 














4 ABRAHAM WALD 
Herice 
2 
(3) d(o) = —E, 2 1849 9, 
og 
Consider the Taylor expansion 





(4) Se. © ee? 5 oy Fe 


1 
where 6’ lies in the interval [6 , 6]. Denote a > == by yn(4;). 


Qa 


For 6 = 6, the left hand side of (4) is equal to zero. Hence we have 








(5) val) + EVnlby — 6) 2 2 NBS) _ o, 
or 

1 3d log f(ta, 6’) _ 
(6) yn(Os) + 2n(61) - Xu a oe 


Let Q,(61) be the region defined by the inequality 


1 ~ 3° log f(z2, 6’) 
(7) a + d(i)| <<» 

where v denotes a positive number less than the greatest lower bound of d(6). 
We shall prove that 


(8) lim P[Q,(6:) | @:] = 1 


uniformly in 6,;. Let 7 be a positive number such that 


3 log f(x, 61) «® 


(9) Es, gi(z, A, 7) = E,, 06? 2’ 


@ = 1, 2) 


for all values of 6,. Because of Assumption 2 such a 7 certainly exists. 
Denote by R,(@,) the region defined by the inequality 


(10) |6, — 0| < 7. 
On account of Assumption 1 
(11) lim P[R,(6:) |] = 1 


uniformly in @,. Since 6’ lies in the interval [6,, 6,], we have 


(12) | a” = 6; | < To 


for all points in R,(6;). Hence at any point in R,(6,) the inequality 


n n 2 , n 
(13) 2X gi(ta, 01, 70) < 2d 0’ log f(ta, 6’) $ a g2(La , 1, To) 


holds. 


06? 








TESTS OF STATISTICAL HYPOTHESES 5 


Let S,(@:) be defined by the inequality 





|1 
(14) =D eile, 1, 7) — Bn erle, , 2) | <3 


and 7',(@:) by the inequality 





1 
(15) > z ¢2(Le ’ A, 70) ar Es, ¢2(z, 6; ? 70) . 5: 
On account of Assumption 2 we have 
(16) lim P[S,(6:) | 1] = lim P[7',(@:) |] = 1 
uniformly in 6,. 


Denote by U,(@:) the common part of the regions R,(6:), S,(@:) and 7',(6,). 
In U,(@:) we have on account of (9), (14) and (15) ‘ 





2 
(17) => gilLa, 01, 70) — Ep, : ae <yp (¢ = 1, 2). 


2 


From this we obtain (7) because of (13). That is to say, the inequality (7) is 
valid everywhere in U,(6,). Since 


lim P[U,(6:) | @:] = 1 


uniformly in 6, , our statement about Q,(6:) is proved. From (6) and (7) we 
get that everywhere in Q,(61) the inequalities hold: 











(18) Fr, <ea(6) < GO it wl) 20; 
(19) rd, 20) = Ait wl) <0. 


Let 22(6:) be defined as follows: 22(6:) = Zn(0:) at any point in Q,(@:), and 
z2(0:) = yn(0:)/d(6;) at any point outside Q,(61). 
On account of (8) we obviously have 


(20) lim P[zt(@) < t|@:) — Plzn(@:) < t|@] = 0 

uniformly in ¢ and 6. 

From equation (1) it follows that E», yn(@:) = 0. From Assumption 4 it follows 
on account of the general limit theorems 


1 t 
(21) lim Plya() < t|6) — Tea [ etteen gy — 9 


uniformly in ¢ and 6,. Hence 


. Yn(01) _ Jd) f* -veaep 3, — 
tim P| Uae <t\a| V ~ [ as 

















6 ABRAHAM WALD 


uniformly in ¢t and @,. ‘Since v can be chosen arbitrarily small, we get easily 
from (18), (19), (20) and (21) 





(22) lim | P| es < t\o.| — Plen() < t|@]| = 0 
n=co | d(61) 
uniformly in ¢ and 6,. Proposition 1 follows from (21) and (22). 
Proposition 2. Let {W,} be a sequence of regions of size a, i.e. P(Wn | %) = a, 
and let V,(z) be the region defined by the inequality 


(6, — 0) ~/n <z. 


Let U,(z) be the intersection of V,(z) and W,,, and denote P[U,(z) | 4] by F,.(z). 


Denote furthermore P[W, | 0 + u/~/n] by Glu, n). If F(z) converges to F(z) 
and tf lim pn = up, then 


n= 





(23) lim Gun, n) fe [ eh u?—2u2) /e dF (z) 


n=c0 


where 


d log f(x, 6) 60) 
1 / ny, 288089 


Proor: First we show 


(24) [ : aF(s) = a. 





Denote P[V,(z) | 6] by *,(z). On account of Proposition 1 ®,(z) converges 
uniformly to the cumulative normal distribution ¥(z) with zero mean and 
variance c. It is obvious that 


Since F,,(z) < a@ and therefore also F(z) < a, we get from (28) 


(25) Fy(@) — Fala) < ®n(Zz) — Bn(21) for 22 > 2. 

Hence 

(26) F(z) — F(a) < (ze) — (a) for 2. > 2%. 

From (25) we get | 
(27) [ tim Fale) | — F,(z) = a — F,(z) < 1 — &,(2). | 
Hence | | 
(28) & — F@) <1- ve). | 


0<a— F(z) <1 — yi). 
Hence 


(29) lim F(z) = a. 


a rs 











— - 
—— el 


TESTS OF STATISTICAL HYPOTHESES 7 


Since F,(z) < ®,(z), we have F(z) < y/(z), and therefore 


(30) lim F(z) = 0. 


z=—oO 


The equation (24) follows from (29) and (30). 
It follows easily from (26) that the integral on the right hand side of the equa- 
tion (23) exists and is finite. 
Let us denote % + un/+/n by @,. Consider the Taylor expansions 
0 
2 log f(a, 9) = du log f(a, 4n) + (00 — bn) Do 39 08 Sze, bn) 

(31) 3° 

+ $(0 ‘aia 6,)” z 382 log f(za, i) 


and 


E log f(t, Ox) = L log f(za, bx) + On — bs) 5 log f(t, bn) 
(32) 


+ 4(0, — 6,)° Xu zm log f(za, 0%) 


where 67, lies in the interval [4 , 6,] and 6, lies in the interval (6, , 6,]. Since 
6, is the maximum likelihood estimate, we get from (31) and (32) 


(33) } log a 80) _ 2X log f(za, 6,,) + $(6 _ 6)” dX q log fae, 6); 


(84) DL log flza, @) = log f(te, b.) + 40a — On)? DF log flere, 2). 


Denote by 8 a real variable which can take any value between —2y and +2u,. 
Denote by R, the region defined by the inequality ~ 


(35) | 6, — 4 | < n, 
From Proposition 1 it follows easily that . 
(36) lim P(Ra|% + 6/~/n) = 1 


uniformly in 8. Denote 2n* by 7,. Then for almost all n the following 
inequalities hold at any point in R, : 


37) Leilte,, 70) < X © log f(ta, 04) < X elt, fo, 70); 
(88) Leen, Ho, 70) SL Flow Slee, 6a) < LO elt, , 7). 


Denote by S, the region in which (35), (37) and (38) simultaneously hold. It 
is obvious that 


lim P(S, | + 8/+/n) = 1 








8 ABRAHAM WALD 


uniformly in 8. Denote 6 + 8/+/n by 6,(8). From Assumption 2 it follows 


easily that 


> ¢i(Le, 6, Tn) 
(39) lim Eo, 18) a 





n ° 06 c : 


uniformly in 8. Furthermore the variance of >, 8) , if 6,(8) is the 


true value of the parameter 6, converges to zero with n — © uniformly in 8. 
Hence a sequence {A,}, (n = 1, 2, --- , ad inf.), of positive numbers can be 
given such that 


(40) lim \, = 0 
and 
(41) lim P[T’, | @,(8)] = 1 


uniformly in 8, where the region 7’, is defined by the inequality 


(42) 





i ¢i(La, 60, Tn) 4 1 


| <i* (i = 1,2). 
n Cc 


From (37) and (38) it follows that in the intersection 7), of 7, and S, 


1 a ’ 1 nf 
(43) 2 Xu ae log f(ta, On) + ~ < Ann 
and 

lin & a - 
(44) |; SD Flog f(a, On) + | <r,n7. 


We get from (33), (34), (35), (43) and (44) that at any point in 7, 
: (45) z. log f(ze, 6x) = x log f(te, 60) — x [(@0 i 6,)° a (6, iain 6,)"] + . 


where | A, | < pAn, and p denotes a constant not depending on n. 
On account of (36) and (41) we have 


(46) lim P[T., | @,(8)] = 1 


uniformly in £8. 

Denote by 7',(z) the intersection of U,(z) (defined in Proposition 2) and 7’, . 
Denote furthermore P[T',(z) | 60] by F(z). 

Since 


n[ (Oo ai 6)” - (An ea 6,)"] = n[ (Oo ae 6)” a (4 — bn + un/V/n)’] 
= —pn + 2Snun(bn — %), 





| 
| 
} 
| 








TESTS OF STATISTICAL HYPOTHESES 9 
we get from (45) and (46) 
(47) lim {PITWe) |] — [ etod tenn arxc} =0 


uniformly in z. It is obvious that 

(48) lim {P[Tx(2) |@n] — P[Un(z) |@]} = 0 
uniformly in z. Hence we get from (47) 

(49) lim {PLUG | @nlj— [ . oe ars} = 0 
uniformly in z. It follows from (49) that for any positive L 


(50) lim {PIU |@.] — P[U,(—L) | 6] — [ . e 1uR—tnidie rx} = 0. 


Since lim u, = yu, lim [F2(é) — F,(é)] = 0 uniformly in t, and since lim F,(t) = 


F(t) uniformly in t, we get from (50) 
(51) lim {P[U,(L) | @,] — P[Un(—L)|6,]} = [toon arco. 


Now let us calculate the limit of P[V,(z) | 6,] if» — ©, The region V,(z) is 
defined by the inequality 


(52) (6n — 0) Vn <2. 
This inequality can be written as follows: 
(53) (6, = 6n) Jn <st=— Mn - 


Since lim yu, = pw, we get on account of Proposition 1 





z—p 
lim P[(6, — On)-/n <2 — pal On] = : [ e tle dt 





(54) — 
is 1 7 —i(t—p) 2/c 
= Je [e dat 
Hence 
. 1 —_ 
55 lim P[V,(z) |@,] = ——= [ eitw te ay 
( ) » n=co )| a/2ne 0 


uniformly in z. 
For any positive ¢ let L, denote the positive number satisfying the condition: 


1 wo / ° Hen) 2% |- € 
(56) " If. e dt + i e dt| = 5° 














10 ABRAHAM WALD 
From (56) we easily get on account of (26) 


(57) 0 < [ e But—2ut)/e dF (t) _ [ odut—2us) /e dF(t) < a 
oo Le 


Since the region U,(z) — Un(z:) is a subset of V.(z2) — Va(zi) for 2 > a, 
we have on account of (55) and (56) 


(58) lim sup | {P[Un(%) | 6,] — P[U»(L.) | On] + P[U,(—L.) | 6n]} | < 5° 


Since 
P[U,(©) | 6] = G(un ’ n), 


we have 


(59) lim sup | @(un, m) — {P[Un(Le) |] — PLUa(—L.) | Ol} | < 5. 


From (51), (57) and (59) we get 


0) - [ etude le aR(I)| < ¢, 





(60) lim sup 
Since ¢« can be chosen arbitrarily small, Proposition 2 is proved. 


4. Theorems on asymptotically most powerful tests. 

TuroreM 1: Let M, be the region defined by the inequality ~/n (6, — 0) > As, 
where A, is chosen such that P(M,,| 6) = a. Then {M,} is an asymptotically 
most powerful test of the hypothesis @ = 0, provided the parameter 0 is restricted 
to values > 4%. 

Proor: Assume that there exists a test {W,} of size a such that 


(61) lim sup L(W,, M,) = 6 > 0. 
Then there exists a subsequence {n’} of the sequence {n} and a sequence {6,-} 
of parameter values > 6 such that 


(62) lim {P(Wa | On) — P(Ma |On)} = 6 


Ae ct 


The expression 
(63) (Onr — 0) ~/n = un > 0 


must be bounded. This can be proved as follows: Since unUer the assumption 

= @ the distribution of ~/n (6, — 9%) converges to a normal distribution with 
zero mean and finite variance, the sequence {A,} must be bounded. Hence M,, 
is defined by the inequality 


ee 


(64) 6, - = A,/Vn ak 











TESTS OF STATISTICAL HYPOTHESES 


where 


(65) lim e, = 0. 


n=00 


From Assumption 1, (64) and (65) it follows easily that if 
lim 0x7 = 0: > 4, lim P(M,, | 6.) = 1. 


Hence on account of (62) we must have 


(66) lim On? = 60. 


n=co 


If there would exist a subsequence {n*} of {n’} such that lim uz» = ©, then 


n=o0 


on account of (66) and Proposition 1 we would have lim P(M,+| 6,+) = 1, 


which is in contradiction to (62). Hence the expression (63) must be bounded. 
Let {n’’} be a subsequence of {n’} such that 

(67) lim par = wp > O. 

Denote by F,(z) the probability of the intersection of W, and the region 
(6, — 0)~/n < z under the hypothesis that 6 = 6). Consider the subse- 
quence {n’’’} of the sequence {n’’} such that F,.-.(z) converges with n — 
towards a function F(z). The existence of such a subsequence {n’’’} can be 
proved as follows: Denote the probability P[(6. — 0)~/n < z| 6] by ®,(z). 
On account of Proposition 1, &,(z) converges with n — © uniformly in z towards 





_ 1 : —4t2/c 
(68) v0) = > [ etl at 


where c has the same value in (23). 
We obviously have 


(69) F,, (22) — F,,(@) < ®,,(22) a ,,(21) 
for any pair of values z, , z2 for which z2 > z,. Hence 
(70) lim sup [F (22) — Fa(:)] < W(z2) — y(a). 


Since F,,(z) is a monotonic function of z, our statement follows easily from (70) 
and the fact that ¥(z) is uniformly continuous. Hence on account of Proposi- 
tion 2 we have 


(71) lim P(Wa | Ones) = [ eg tu?—2us) le dF (z) 


n=O 


and 


(72) lim P(M,++ | O,e1) = | ¢tum!¢ ga5(2) 


n=o 











12 ABRAHAM WALD 


where 

(73) ®(z) = Oforz < a, 

(74) &(z) = ¥(z) — ¥(a) forz > a, 
and 2 is given by 

(75) 1 — ¥(z) = a. 


From (62), (71) and (72) we get 


3) [ " diF@ — @] = 8 > 0. 

Consider a normally distributed variate y with mean v and variancec. Let B 
be a critical region of size a for testing the hypothesis vy = 0 by a single observation 
on y, i.e. B is a subset of the real axis [— ©, +]. Denote by D(v) the inter- 
section of B and the region C(v) defined by the inequality y < v. Denote by 
H(v) the probability of D(v) under the hypothesis vy = 0. Then the power of 
the test B with respect to the alternative v = u is given by the following ex- 
pression 


(77) [ etur—wdle GzT(y), 


If the region B is given by the inequality y > vp where v is chosen such that the 
size of B is equal to a, then H(v) = &(v) where the function © is defined by the 
equations (73), (74) and (75). Since the latter test is uniformly most powerful‘ 
with respect to all alternatives v > 0, for any positive » the-inequality 


(78) [ ‘ e tu wle TA (y) — &(v)] < 0 


holds. Let 





v(v) = — | e tl de, 
2rc + 
It is obvious that 


(79) Hi) — He) < #0) — He) form > »: 


and 


(80) [ 7 Mh = a, 






4 See for instance J. Neyman and E. S. Pearson, ‘‘Contributions to the theory of testing 
statistical hypotheses,’’ Stat. Res. Memoirs, Vol. 1 (1936). 











TESTS OF STATISTICAL HYPOTHESES 13 


On the other hand, if K(v) is a monotonically non-decreasing non-negative func- 
tion of v such that 


(79’) K(ve) — K(v1) < (ve) — (v1) for » > 4 
and 
(80’) [ dK) = « 


hold, then there exists a sequence {B}, (¢ = 1, 2, +--+ , ad inf.), of regions of 
size a such that 
lim Hv) = K(v) 


i=co 


uniformly in v. Since (78) holds for H(v) = H“(v), and since 
H® (vw) — H° (v1) < v2) — ¥(u1) form > 1, 


it is easy to see that (78) will hold also for H(v) = K(v). Hence for any mono- 
tonically non-decreasing non-negative function K(v) for which (79’) and (80’) 
are fulfilled, also (78) must hold. Since F(v) is a distribution function which 
satisfies (79’) and (80’), we have a contradiction to (76). This proves Theorem 1. 

TuHEoreM 2: Let M, be the region defined by the inequality ~/n (6n — %) < An, 
where A, ts chosen such that P(M,| %) = a Then {M,} is an asymptotically 
most powerful test of the hypothesis 8 = 0 , provided that the parameter 6 is restricted 
to values < %. 

We omit the proof since it is entirely analogous to that of Theorem 1. 

THEOREM 3: Let M, be the region consisting of all points which satisfy at least 
one of the inequalities 


Vn (6n — %) < —A,, Vn (bn — %) > An. 


The constant A, > 0 is chosen such that P(M,| %) = a. Then {M,} is an 
asymptotically most powerful unbiased test of the hypothesis @ = 4. 

Proor: Assume that there exists a sequence {W,} (n = 1, 2, --- , ad inf.) 
of regions such that 


(81) ° P(W,| %) =a 

(82) | lim g(W,,) = @ 

and 

(83) lim sup L(W,, M,) = 6 > 0. 


We shall deduce a contradiction from this assumption. On account of (83) 
there exists a subsequence {n’} of {n} such that 


(84) lim {P(War | @n1) — P(Ma | On-)} = 6. 








14 ABRAHAM WALD 


The expression 
(85) (Ons — O)~/n! = une 


must be bounded. The proof of this statement is omitted, since it is analogous 
to the proof of the similar statement about (63). Hence there exists a subse- 
quence {n’’} of {n’} such that 

(86) lim pare = py. 

Denote by F,(z) the probability of the intersection of W, with the region 
(6, — %)~/n < z under the hypothesis @ = 6). Consider a subsequence {n’’’} 
of {n’’} such that F,---(z) converges with n — © towards a function F(z). 
The existence of such a sequence {n’’’} can be proved in the same way as the 
similar statement in the proof of Theorem 1. Hence on account of Proposition 2 
and (86) we have 











(87) lim P(Wy | Ones) = [ et ut—2ua) /e dF(z) 

and 

(88) lim P(Myv+> | O,") = [ eg MuP—Bu2)!6 daz) 

where 

(89) &(z) = = | et! dt for z< — 2, 
rc v2 

(90) &(z) = O(—m) for —acz<% 

(91) &(z) = d(—z) + : / et dt for z2>2, 

2c 20 
and 
(92) . b(—2) = ja. 


From (84), (87) and (88) it follows that 
(93) [ 4" dF® - e@] = 6. 


Consider a normally distributed variate y with means v and variance c. Let B 
an unbiased critical region of size a for testing the hypothesis vy = 0 by a single 
observation on y, i.e. B is a subset of the real axis [— «©, +0]. Denote by 
D(v) the intersection of B with the region C(v) defined by the inequality y < v. 
Denote by H(v) the probability of D(v) under the hypothesis »y = 0. Then 
the power of the test B with respect to the alternative v = yu is given by 


(94) | eg tur—tw)le GET(y), 


0 





TESTS OF STATISTICAL HYPOTHESES 15 


If the region B consists of all points which gatisfy at least one of the inequalities 
y < —%,Yy => w%, and if »% > 0 is chosen such that the size of B is equal to a, 
then H(v) = (v), where #(v) is defined by the equations (89)-—(92). Since the 
latter test is a uniformly most powerful unbiased test,’ for any » the inequality 


(95) | " e tutto) le GA (v) — &(v)] < 0 


oo 


holds. Let 





ee 
Wo) = [ete ae 


It is obvious that 


(96) H(v2) — H(r1x) < Wve) — v1) for wm >, 
(97) [an =. 

and 

(98) [ e4*-20")/¢ d77(y) has a minimum for p = 0, 


On the other hand, if K(v) is a monotonically non-decreasing non-negative func- 
tion of v such that 


(96’) K(v2) — K(vi) < W(v) — Yr) for v2 > v4, 
(97’) [ dK(v) = a, 
(98) r e **—*u)/e 7K (vy) has a minimum for p = 0, 


then there exists a sequence {B“} (¢ = 1, 2, --- , ad inf.) of unbiased regions 
of size a such that 


lim H(”) = K() 


t= 


uniformly in v. Since (95) holds for H(v) = H“(v) (¢ = 1, 2, --- , ad inf.), 
and since 


H® (vz) — Hv) < o(v2) — ¥(v:) for v2 > ' 


it is easy to see that (95) holds also for H(v) = K(v). Hence for any mono- 
tonically non-decreasing non-negative function K(v) for which (96), (97’), and 
(98’) are fulfilled, also (95) must be fulfilled if we substitute K(v) for H(v). 


5,J. Neyman and E. S. Pearson, |. c., p. 29. 





16 ABRAHAM WALD 


Since F(v) is a distribution function which satisfies (96’), (97’) and (98’), we 
have a contradiction to (93). This proves Theorem 3. 


5. Appendix. Proof of the uniform consistency of 6,. It will be shown here 
that under certain conditions on the density function f(z, @), Assumption 1, 
i.e. uniform consistency of 6, , can be proved. 

For any open subset w of the 6-axis we denote by ¢(z, w) the least upper 


2 
bound, and by y(z, w) the greatest lower bound of o log f(z, 9) with respect 


to @in the setw. For any function A(x) we denote by E,A(x) the expected value 
of \(x) under the assumption that 6 is the true value of the parameter, i.e. 


E,X(2) = [ ” Napa, 0) de. 


Denote furthermore by P(6, €w | @) the probability that 6, will fall in » under 
the assumption that @ is the true value of the parameter. Finally denote by Q 
the parameter space and assume that 0 is either the whole real axis or a sub- 
set of it. 

Proposition 3. 6, is a uniformly consistent estimate of 0, i.e. for any positive k 

lim P(—k < 6,-—0<k|@) =1 

uniformly for all 6 in Q, if the following two conditions are fulfilled: 

Condition I. For all values 0 in Q 


"af(z,0), f° afiz,6), _ 
i. a "1a 


Condition II. For any value 6 in Q there exists an open interval w(6) containing 6 
and having the following three properties: 


II,. lim P(6, €w(6) | 6] = 1 


uniformly for all 0 in Q. 

Il,. Eog’[x, w(0)] is a bounded function of 0inQ, and the least upper bound A of 
Esg|x, w(0)] with respect to 6 in Q is negative. 

II,. Eoy[x, w(@)] is a bounded function of 6 in the set Q. 

Condition I means simply that we may differentiate under the integral sign. 
In fact 


imc 6) =1 


identically in 6. Hence 


2 [ se, ae = © [ fte,0) dr =0. 


Differentiating under the integral sign, we obtain Condition I. 
























TESTS OF STATISTICAL HYPOTHESES 17 


In case that w(@) is the whole axis Condition II, reduces to the condition 
that 6, exists. 


In order to prove Proposition 3, we show first that for any positive 7 


(99) tim P| (1 <= L Ss 9108 f(te, 4) - )|o|=1 


n=0 N a=1 06 


uniformly for all 6 in 2. We have on account of Condition I 


A log f(z, @) _ » af(z, 0) _ (ane, _ 
(100), 8S LED / s¢,0) = [ ee de = 0. 





Since 


Tog LE A) 3 | MeO ye, 0) | = SF ys, 0 — (WED /ips, oh 


we have on account of Condition I 


(101) E, ¢ mete Oy _ — , 9 ese, 6) 











06 06? 
According to Condition II E,sy[z, w(6)] < 0 and is a bounded function of @ 
2 
Steen Mo oO < 0 and > Eyylz, o(6)], the left hand side of (101), i.e. 


the variance of a. is a bounded function of 6. From this and the 








equation (100) we obtain easily (99). Consider the Taylor expansion 





Ly log f(ta,8) 4p gy lL pr & log f(za, On) 
_ nee ag OO ag 


where 0, lies in the interval [@, 6,]. Let ¢ be an arbitrary positive number and 
denote by Q,(@) the region defined by the inequality 





(103) : > © <e. 






On account of (99) we have 


(104) lim P[Q,(6) | 6] = 


uniformly for all 6 in Q. 
Denote by R,(@) the region defined by the inequality 





(105) = X elte, o(0)] < 44 <0. 


On account of Condition II, 





(106) lim P[R.(6) | 6] = 
















18 ABRAHAM WALD 


uniformly for all @in 2. Denote by B,(@) the region in which 6, €w(@). Since 
in B,(@) 


1-5 F bog flees Oe) 1 > ote, of] 











n 06? ~ n 
we have in the intersection R,,(6) of R,(@) and B,(6) 
11 < &" log f(a, 02) A | 
Denote by U,(6) the intersection of Q,(@) and R:,(6). It is obvious that 
(108) lim P[U,(@) | 6] = 1 
uniformly for all @in 2. From (102), (103) and (107) we get that in U,(@) 
“ 2e 
109 §-41 <->, © 
_— 0%! S yay = [al 


Hence on account of (108) 


2e 
ale) =} 
uniformly for all @ in Q. Since ¢e can be chosen arbitrarily, Proposition 3 is 
proved. 

Conditions I and II are sufficient but not necessary for the uniform con- 
sistency of 6,. For sufficiently small w(@) the conditions I], and II, are rather 
weak. In fact, on account of (101) we have 


d log f(z, 8) 
06? 


lim P(\@ — 4.| < 


n=o 





E, < 0. 
Hence for sufficiently small intervals w(@), under certain continuity conditions, 
also Eey[x, w(6)] will be negative. However, in some cases may be difficult to 
verify II, for small w(@). On the other hand, for sufficiently large w(@) (cer- 
tainly for w(@) = [— ©, + ~]) II, can easily be verified, but the conditions I, 
and II, might be unnecessarily strong. In cases where II, or II, does not hold 
for w(@) = [— «©, +] and the validity of II is not apparent, the following 
Lemma may be useful: . 

Lemma: Proposition 3 remains valid if we substitute for Condition II the con- 
ditions 

II’. Denote by T,, the set of all points at which 6, exists and 


(110) X F hog sire, *) = 0 


has at most one solution in 6*. Thenlim P[T, | 6] = 1 uniformly for all 6 in Q, and 


II’. There exists a positive k such that for w(6) = I(@) = (@ — k, 0+ k) the 
following two conditions hold: 





TESTS OF STATISTICAL HYPOTHESES 19 


Ik,. Esg’[x, 1(6)] is a bounded function of @ in Q and the least upper bound A 
of Eeo[x, I(0)] with respect to 0 in Q is negative. 

II. Eoy[x, I(6)] is a bounded function of 6 in the set 2. In cases where II, 
or II, is not fulfilled for w(6) = [— ©, +] the verification of II and II’ may 
be easier than that of IT. 


Our Lemma can be proved as follows: Consider the Taylor anil 
(111) =z 5 5 06 S(ra, 6*) = ~ 25, log f(ta, 0) + (6* — 0) z= log flee, 6’) 
where @’ lies in [6, 6*]. Denote by V,(@) the region defined by 
(112) * lta, 1) < 44 <0. 


On account of II;) we have 


(113) lim P[V,(6) |] = 1 


uniformly for all 6in Q. Let W,(@) be the region defined by 


1 


(114) “2s log f(a, 9) 


From Condition I and Condition II’’ it follows easily that 
(115) lim P[W,,(@) | 6] = 


n=o 


uniformly for all 6in Q. For all values 6* ” the interval I(@) we have 


(116) ~ volta, 1(@)] > ~ ad. log fltte, 6’). 


Because of (112) and (116) we have in V,(6) 


1 6’ 1 

(117) 1» tog flee, 0 )<#A<0 

for all values 6* in the interval J(@). Let ¢ be less than |}kA |. Then in the 
intersection W,,(@) of the regions V,,(6) and W,,(6) we obviously have on account 
of (114) that the values of the left hand side of (111) for 6* = @ + k and 6* = 
@ — k will be of opposite sign. Hence at any point of W,,(@) the equation (110) 
has at least one root which lies in the interval J(@). Since (110) has at most 
one root in 7’, and since 6, is a root of (110), we get that at any point of the 
intersection W.(6) of W.,(0) and 7, 6, lies in I(@). Since 


(118) lim P[W’’(6) |@] = 1 uniformly for all @ in Q, 
also 
(119) lim P[6, €I(6) | 6] = 


uniformly for all @in @. The relation (119) combined with the conditions Il, 
and II, is equivalent to Condition II. Hence our Lemma is proved. 





EXPERIMENTAL DETERMINATION OF THE MAXIMUM OF A 
FUNCTION’ 


By Harotp Hore.Liine 
Columbia University, New York City 


1. The necessary background for efficient experimental determinations. We 
shall deal with the problem of arranging an experiment for determining the 
value of x for which an unknown function f(z) is a maximum or minimum. 
This problem is to be distinguished from those of estimating the maximum or 
minimum itself, and of studying the distributions of such estimates, problems 
to which Bernstein [1] and Rice [2] have contributed. 

The range of applications in which determinations of maximizing and mini- 
mizing values are important is extremely wide. Among these are the deter- 
mination of the time of year at which the number of algae or bacilli in a lake 
is a maximum, and the amount of fertilizers and of irrigation water making the 
yield of a crop a maximum. The magnetic permeabilities of permalloys, per- 
minvars and permendurs as functions of the induction, and the hardness of a 
copper-iron alloy as a function of the time of aging at 500°C., possess smooth 
maxima having interest in telephony, [3], [4]. The effective range of a gun is a 
function of the speed of burning of the powder, a variable which can be con- 
trolled. Almost every entrepreneur has a fervent desire to know the selling 
prices that will yield a maximum profit, and a few have undertaken controlled 
experiments with a view to finding out. There are also numerous practical 
problems of minimizing costs; for example, the cost of operating a ship as a 
function of its speed possesses a minimum. We shall confine our attention 
chiefly to the experimental determination of maxima, since such problems seem 
to occur naturally with greater frequency in applications; there is no loss of 
generality in this, since f(z) has a maximum where —f(z) has a minimum. 

We shall assume that, for each value of z in the set we shall select, one or 
more observations will be made on y = f(x), and that these observations are 
afflicted with errors which are independently distributed about zero with a 
common variance o. From this it follows that if f(x) is a linear function of 
known functions of z, with unknown coefficients Bo , 6:1, -+- , Bp (for example 
a polynomial in x), the most efficient method of fitting is the method of least 
squares, which yields unbiased estimates bo , --- , bp of Bo, --- , Bp having the 
least possible variances; this is true whether or not the errors are normally 
distributed. If the fourth moment of the errors is finite, and if the number N 


1 Presented at the joint meeting of the Institute of Mathematical Statistics and the 
American Mathematical Society at Hanover, September 10, 1940. 


20 





MAXIMUM OF A FUNCTION 21 


of observations is large, the estimated coefficients will be distributed in kn 
approximately normal manner; and so also will any function of them that is 
regular in a fixed neighborhood of its ‘population value.” By the “population 
value”’ of a function ¢(bo , --- , bp) we mean $(f), --- , Bp). In particular, if 


f(x) = Bo + Bit + fox” «++ Byx” 


has a maximum for x = é of the simplest type, such that f’(¢) = 0 and f’’(é) < 0, 
so that ~ is a simple root of the equation 


f'(&) = Bi + Boe + --- + pByt”” = 0, 


and if 2 is an estimate of ¢ found from the polynomial fitted by the method of 
least squares, so that 


by + 2bex + +++ + poze = 0, 


this last equation defines 7» as a function of b,,--- ,b,. The function is, to 
be sure, multiple-valued when p > 2; but for sufficiently large values of N the 
probability will become arbitrarily great that the roots obtained from a random 
experiment will each differ by an arbitrarily small quantity from one of the roots 
of f’(z) = 0. Then provided we have a sufficient preliminary approximate knowl- 
edge of £, we may choose the root nearest £; and the probability distribution 
of this root, which in nearly all experiments will be a single-valued function 


o(by hr oe bp), 


will approach normality of form, with standard error of order N~™”, about a 
mean differing from 


—= $(A: , wi » Bp) 


at most by terms of order N~’, which are thus negligible in comparison with the 
standard error. The situation will be effectively the same if, without knowing ~ 
in advance even approximately, we choose the root 2» giving the greatest value 
f(xo), provided f() is greater than any other value of f(z). ‘ 
From these considerations it appears advisable, whenever the unknown func- 
tion is capable of being represented adequately by a polynomial of degree p 
considerably less than the number N of observations, to fit a polynomial of 
degree p by least squares, and from it to determine the maximizing value by 
differentiation. In practice, however, there are obstacles to carrying out such 
a procedure with confidence. The form of the function is usually not known; 
it is far from clear what value should be given p even if the function is to be 
regarded as a polynomial; the use of a polynomial which does not give a suffi- 
ciently good fit, with observations taken at a considerable distance from the 
maximizing value, perhaps separated from it by other maxima and minima, 
appears to be a highly dubious proceeding; and if p is taken large, the labor of 
calculation becomes excessive. For all these reasons it is desirable to assign 
the values of z which are to be the basis of the experimental work close enough to 





22 HAROLD HOTELLING 


the maximizing value ~ so that a polynomial of very low degree will fit ade- 
quately in the neighborhood. 

We shall restrict ourselves to functions having continuous derivatives of all 
relevant orders’ in a neighborhood of ~ Such a function can in a sufficiently 
small neighborhood be approximated by a polynomial of the second degree. 
The necessity of using a polynomial of higher degree can therefore be avoided, 
when a fairly good knowledge of the function is already in hand, and when the 
number N of observations that can be made is large enough, by choosing all 
the values of x in a sufficiently small neighborhood of ¢. We shall suppose that 
this is done; that is, a regression equation 


Y = bo + bx + box” 


is fitted by least squares to a large number of observations after choosing the 
values of x quite close to the true maximizing value £; and the estimate 2» of & 
is a solution of dY/dx = b; + 2hx = 0, so that 


sali bi 

Qbe ° 
We shall examine the errors in 2 arising both from the inadequacy that may 
exist in the quadratic approximation and from the random errors of observation, 
and shall consider what distribution of x may most appropriately be chosen to 
reduce the errors of both kinds, and to place them in a suitable balance with 
each other. 

It will be observed that a fairly definite preliminary knowledge of the function 
under investigation is required for such a program. Any criterion for the selec- 
tion of values of x for experimentation must involve not only the value of & 
but also the values of the first few derivatives in a neighborhood of £, or some 
similar information. The requirement of preliminary information is essential 
for the efficient design of experiments in general. For instance the efficiency 
of an agricultural field experiment depends on the correctness of the appraisal, 
before the experiment is laid down, of the general nature of the fertility gradients 
likely to exist in the field and of the variances due to error and main effects 
which will be revealed more accurately by the experiment itself. If the pre- 


ym = 


2 Other cases may well arise in practice and deserve separate consideration in connection 
with the particular investigations in which they arise. For example various physical 
properties of alloys, regarded as functions of the proportion of a particular constituent, 
have maxima, but may have discontinuous derivatives because of the phenomena of crys- 
tallization and solution of one metal in another. The assumptions appropriate to an in- 
vestigation, parallel to that of the present paper, of the proper organization of experiments 
for finding such metallurgical maxima must be drawn from metallurgy. The case of con- 
tinuous derivatives is however of widespread importance. If no regularity assump tion is 
made about the function, one set of N values of z is as good as another, and no set is likely 
to tell us very much about the function if it is one of the violently irregular ones utilized 
in the theory of functions to emphasize the necessity of studying that subject. 





MAXIMUM OF A FUNCTION 23 


liminary information is incorrect, a properly arranged self-contained experiment 
will nevertheless give results which are valid, in the sense that the significance 
probabilities calculated from them by accurate methods are correct, but will be 
inefficient, in the sense that another experiment of the same cost, based on better 
preliminary information, would be more likely to detect real effects through the 
smallness of such a calculated probability. The efficient conduct of experi- 
mentation thus proceeds in stages of ascending magnitude. A large-scale in- 
vestigation should be preceded by a smaller one designed primarily to obtain 
information for use in designing the large one. The small preliminary investiga- 
tion may well in turn be preceded by a still smaller pre-preliminary investigation, 
and so on,’ like an army marching after an advance guard, which follows a more 
advanced smaller detachment, which follows a still smaller and still more ad- 
vanced unit, which follows a “point.” At the very beginning of the process of 
chain experimentation will stand work based on little or no clear information 
of the kind required for efficient design. This first phase will be speculative 
and exploratory in character. Neither its cost nor its accuracy can well be 
estimated in advance. It is a favorite, but not exclusive, preoccupation of men 
of genius. Many of its results turn out to be worthless. But it is an essential 
preliminary to well-organized research directed to definite aims defined qualita- 
tively in advance. 

After the first speculative and unsystematic phase in the knowledge of a 
subject is past, but before the careful, economical organization of an accurate 
investigation, an intermediate type of exploration is needed to supply estimates 
of the parameters required for the design of the full-scale investigation. In the 
present case such a systematic though small-scale experiment might perkaps 
consist in dividing a range within which the desired maximizing value £ is known 
to lie into equal parts, making at least two observations at each of the ends of 
these intervals, and fitting a polynomial of at least the fifth degree by least 
squares. This-will make possible estimates of the parameters c, 8: , B2, --+ , Bs 
(and hence of £) required for using the efficient designs which we shall obtain. 
At least six different values of x are required for fitting the polynomial of the 
fifth degree. The fitting process is facilitated by taking them in arithmetic 
progression and using orthogonal polynomials. 


3 A remarkable example of such a series of investigations is the chain of sample censuses 
of area of jute in Bengal carried out for the Indian Central Jute Committee under the 
direction of Prof. P. C. Mahalanobis annually beginning in 1937. Each year’s work is 
designed primarily to obtain information for planning the next year’s, and a sequence of 
four or five such investigations, each considerably larger than the preceding, is planned 
to lead up to an eventual annual sampling of the whole immense jute area in the province. 
A partial account of this is given in [5], a fuller one in confidential but printed reports of 
the Indian Central Jute Committee, Calcutta. 

Certain multiple-sample schemes in manufacturing inspection also provide good 
examples of chain experiments, [6]. 





24 HAROLD HOTELLING 


2. Sampling errors and bias in the quadratic approximation. Let us measure 
all values of z from the value é under investigation which makes f(z) a maximum. 
Then ~ = 0, and in the expansion 


(1) f(x) = Bo + Bit + Box” + Byx® + --- 


we shall have 8; = 0 and #2 < 0; we shall assume that B2 < 0. An observation 
Ya corresponding to a chosen value z+, wil have, by assumption, an error A, of 
zero expectation and variance o°, such that 


(2) Ya = f(ta) + Aa. 

A quadratic estimate 

(3) Y = bo + biz + bor” 

of f(x) is obtained by means of normal equations which may be written 
dobo + aibi + debe = Sy 

(4) aybo + debi + Asbo = Sry 
dbo + asb, + asb, = Sz’y, 


where S stands for summation over all the observations, so that, for example, 
Sy = Lya = Yi + Y2 +--+ + yw, and where 


(5) ar = Sx“. 


In particular, d = N. A determinate solution is possible only if there are at 
least three distinct values of x; we shall always suppose therefore that this is 
the case. This is equivalent to assuming that the determinant a of the coeffi- 
cients in (4) is not zero. A greater number of observations y is necessary to 
obtain an estimate of the variance o’, and furthermore we shall suppose this 
number large in our approximations, but since repeated observations may be 
made for each value of z, it is not essential that there be more than three values 
of zx in the distribution to be selected. 
If we put 


(6) 5b, = De — Bx, vz = Sz*a, 


for k = 0, 1, 2, substitute (1) in (2) and the result in (4), and utilize (5) and 
(6), we obtain ‘ 


Apdbo + 5b; + a2db2 = Yo + aa8s + 48, + --> 
(7) @,6bo + a2db; + a3db2 = 1 + O83 + Osha + --- 
A25bo + a35b; + ayibe = Yo + OsB3 + 664 + --- 


From these equations it follows that the errors 5b, are homogeneous linear func- 
tions of the right-hand members and will therefore be small if the quantities on 
the right are small. Of these quantities, the y;’s will be stochastically of the 





MAXIMUM OF A FUNCTION 25 


order N*” for large samples with any fixed set of values of z. When the equa- 
tions are solved, their coefficients will be of the order of N~’, so that the product 
is of order N~“”, and becomes negligible if N is large enough. The coefficients 
a, Of B3, Bs, - - - can be kept small if the values of x are chosen tolie within a sufficiently 
restricted range. Of course the coefficients a, in the left members of (7) will 
also be small in this case, but not small enough to offset fully the smallness of 
those on the right. To see this, we observe that if all the values of x be multi- 
plied by any quantity g, a; is multiplied by g*, while 


% A ih 
(8) a=|g % as 
Gd as % 


is multiplied by g°. The cofactors of the last column are proportional respec- 
tively to g‘, g’ and g’. Hence, in the expression for dbz , the coefficient of 83 is 
of order g, that of 8, is of order g’, and so on, the coefficients of the 6’s of higher 
orders vanishing more and more rapidly with g as we go on in the sequence. 
The like is true of 6b; and 4b) , which vanish even more rapidly with g. Thus 
we may, by restricting sufficiently the range of z on the basis of the assumed 
preliminary knowledge of the function, and taking a sufficiently large sample 
of observations, bring it about that the probability will be arbitrarily close to 
unity that the 65b;’s are less than any assigned limits. 

Let us, in particular, restrict the range sufficiently and take a large enough 
sample to make it reasonable to regard 5be as negligible in comparison with #2. 
The error in the estimate 


(9) | t= — 5 


of the maximizing value ¢ will, since we are taking § = 0, be % itself, and may 
be written 


bbi(, _ obo 
2(Bs + 5b2) Bo (1 Be ° ), 


where the terms other than 1 in the last parentheses are negligible. The problem 
of minimizing the error 52 is then virtually equivalent to minimizing the error 
6b,. In section 5 it will be shown that it is not until we reach terms of the 
order of g’ that the errors 5bz need be taken into account. We shall first discuss 
the errors in 2» of lower orders in g, and thus confine the discussion to 6b. For 
the present we shall take as the quantity to be made as small as possible the 
expectation of the square of this last error, E(6b;)”. This is not the same as the 
variance of b; , since Eéb,; is not in general zero. We have, in fact, by trans- 
posing a familiar formula for the variance, 


(10) E(8b:)” = (Edb:)” + oi, , 








26 HAROLD HOTELLING 





thus dividing our minimand into two parts, due respectively to the bias arising 
from the neglect of terms of third and higher orders, and to the usual sampling 
errors. 


By the usual least-square theory, the sampling variance of }, is 
(11) ob, = us’, 
where u is the cofactor of the central element in a, divided by a, that is, 
(12) a = (aoa, — a})/a. 


Since u is of the order of g’, we may reduce the sampling variance as much as 
we please by taking the values of x suffigiently far removed from ¢. If f(x) is 
definitely known to be only of the second degree, a wide dispersion of the desir- 
able values of x is thus indicated, since in this case H5b,; = 0, as appears by tak- 
ing the expectation of each term in (7). But if, as will usually be the case, 
f(x) has terms of higher orders than the second, an excessively wide dispersion 
may increase the bias Héb; to such an extent as to render the quadratic approxi- 

‘ mation inapplicable. 
In taking the expectation of each term of (7) and then solving for db, we 
obtain, since Ey, = 0 according to the definition of y; , and because EA = 0, 
a result of the form 


(13) 









E6b; =_ B33 + BBs + BsBs + --- 


We shall call B;, Bs, and Bs respectively the cubic, quartic and quintic com- 
ponents of the bias, or simply biases. If we denote by A, u, v, the ratios to a 
of the cofactors of the second column of a, so that 


(14) AQ, + ude + vas = 1, 
we shall have for the components of bias, 
Bs = dds + was + vO5 
(15) Bg = Aa + ws + G56 
Bs 


Ads + ude + va7, 






and so forth. Since X, wu, and » are of respective orders —1, —2 and —3 ina 
multiplier g of all the values of z, B; is of order 2, By is of order 3, and the higher 
biases are of higher orders. Thus-if we begin with any particular distribution 
of x and apply a sufficiently small multiplier g, we can make the quartic bias 
negligible in comparison with the cubic, the quintic in comparison with the 
quartic, and so forth, provided none of these biases is zero. But in reducing 
g we increase the sampling variance, which is of the order of g * 

Under these conditions it is reasonable to consider what types of distribution 
having a fixed value of the sampling variance make the cubic bias a minimum 
in absolute value; then if there is more than one distribution of this kind, to 
seek among them a class minimizing the absolute value of the total of cubic and 































MAXIMUM OF A FUNCTION 27 


quartic biases; and among these a class minimizing the absolute value of the 
total of cubic, quartic and quintic biases, with the modified meaning of the 
quintic bias taking account of db. . 


3. The cubic and quartic biases. We find, somewhat unexpectedly, that 
there exists a class of distributions of-x for which the cubic bias is actually zero. 
To exemplify this we need give the variable no more than three different values, 
which we may call xz, y and z, and we may assign to them the arbitrary fre- 
quencies k, m, n of experiments (k + m-+n= WN). If we put 


11 1 
(16) P=|2 y z|=(—-y)(y—2¢-2), 


x y° 2 

and consider a matrix of three rows and N columns, of which k columns are 
identical with the first column of P, m with the second, and n with the third, 
it is evident that the sum of the squares of the three-rowed determinants in 
this matrix is kmnP®. But this sum of squares is also equal to the determinant 
formed from the sums of products of the three rows, and this is a (formula (8)). 
Thus a = kmnP’ + 0, since 2, y, z are all different. Together with the fore- 
going 3 X N matrix consider another, 


| 


(17) 


having k columns identical with that first written, m identical with the second 
written, and n identical with the third. The only non-vanishing three-rowed 
determinants in this matrix are formed of these three different columns, and 
equal (xy + yz + zx)P; there are kmn of them. The sum of products of cor- 
responding three-rowed determinants in the two matrices is_ therefore 
kmnP*(xy + yz + zx). But this sum is also equal to the determinant, formed 
from the sums of products of corresponding rows, 


% A a3 

Gi as |, 

7 
which, by (15), equals —aB;. It follows that 
(18) —B; = zy + yz + 2a. 
There are many real solutions of the equation 


(19) zy + yz+ 2x = 0, 





28 HAROLD HOTELLING 


with the three values all different, for example —2,3,6. If we assign such values 
to our variable, and an arbitrary number of experimental determinations to each 
of these values, the cubic bias Bs; will be zero. 

It will be noticed that such a solution cannot have zero for one of the values. 
If, for example, z = 0 in (19), then x or y must also vanish, in violation of the 
condition that there must be at least three distinct values. Moreover a solu- 
tion cannot be symmetrical about zero; if x + y = 0 it follows from (19) that 
xz =y=0. A solution may or may not be symmetrical about a value other 
than zero. The values 3 — 2 +/3, (3 — +/3)/2, ~/3 satisfy the equation and 
are in arithmetic progression, while the solution —2, 3, 6 is asymmetrical. 

If we modify (17) by replacing the cubes of the variables by their fourth 
powers, and apply the same procedure to the modified matrix, we find that 


(20) Be= —(e + yy + 22 + 2). 


Thus there exist sets of three distinct real values making the quartic bias vanish, 
for example any set for which x + y = 0; but no such set can at the same time 
nullify the cubic bias (18). Since it is ordinarily more important for the cubic 
than for the quartic bias to vanish, distributions nullifying (20) are not in 
general to be recommended. But in exceptional cases it may be known that 
Bs is zero, or very small in comparison with 6,, and then the vanishing of B, 
is a more valuable property than that of B;. It will be shown that no distribu- 
tion of three or more values exists such that both the cubic and quartic com- 
ponents of bias are zero. 

Let us denote by D, the p-rowed determinant having a;+;-2 as the element 
in its 7th row and jth column. Thus D; is the same determinant which we have 
in (8) called a, and 


&H A Aa az 
GQ GQ a3 & 
G2 a3 GW 


a3 G4 Ap 


For every distribution, every D, > 0; and a necessary and sufficient condition 
that a distribution have p or more distinct values is that D, be greater than 
zero. [7, p. 362]. If D, is positive, so is each of its principal minors. In 
particular, since we are requiring at least three values in a distribution, D; = 
a > 0, and therefore 


(22) A204 — a3 > 0, 


and 


(23) Wa. — a3 > 0. 





MAXIMUM OF A FUNCTION 29 


We shall now consider distributions for which the cubic bias B; is zero, and 
consequently, by (15), 


(24) Adz oe Ma, + van = 0, 
and expand D,. From the definition of \, u, »v, we have 
(25) Ade + wa3 + va, = 0. 


Multiply the last row of the determinant (21) by v, and add to it \ times the 
second row and u times the third. The last row is thus, by (14), (25), (24) 
and (15) transformed into 


100 &, 


while the determinant has been multiplied by ». Let this new determinant be 
expanded with respect to its last row. The cofactor of the first element 1 is 


Qa GQ a3 
G=—\d@ ds a}. 
a3 G4 as 


Let the last row of this determinant be multiplied by v, an operation having 
the effect of multiplying the whole determinant by 1; and let \ times the first 
row and yz times the second row then be added to the last. The last row is 
thus, by (14), (25) and (24) reduced to 


1 0 0. 
Hence 
vG = —(a2a4 — a5), 
and consequently 
v’D, = v(aBs, + G) 


26 
” = vaB, —(axq4 — a3). 


Since the first member of this equation is positive or zero, (22) shows that it is 
impossible that B, should equal zero when B; = 0 as we have assumed. That is, 

Either the cubic or the quartic bias of every distribution having three or more distinct 
values must be different from zero. 

If » were zero, (26) would contradict (22). Hence vy ~ 0. With every dis- 
tribution of z there is associated another obtained from it by changing the sign 
of each value of x. Such a pair of distributions we shall call opposite. When 
we pass from a distribution to its opposite, the power-sums a, remain un- 
changed when k is even and change only in sign when k is odd. Since a is 
always positive, and since 


(27) vy = (a;d2 — aas)/a, 
















30 HAROLD HOTELLING 






v has opposite signs and the same absolute value for opposite distributions. 
The conclusions to be reached shortly will be equally valid for a distribution 
and its opposite, and in reachir * them we may assume v > 0. It will then 
follow from (22) and (26) that By, > 0. 


4. Distributions nullifying cubic bias with minimum quartic bias. We can 
now prove the following theorem: 

Among distributions for which the cubic bias vanishes and the standard error of 
b; has a fixed value, those for which the quartic bias 1s a minimum have exactly 
three distinct values of the variable. These values satisfy the equation 


(28) ry + yz + zx = 0. 


Since the standard error oa of a single observation is not affected by the dis- 
tribution chosen for z, fixation of the standard error of b; is equivalent by (11) 
to fixation of the value of the expression u given by (12). We suppose therefore 
that » has some fixed positive value and that B; = 0. Since uw, B; and By do 
not involve the distribution of x excepting through the power-sums a , a1, --- , 
ds , we may treat these power-sums as the independent variables in trying to 
make B,a minimum. Their region of variation is limited by the inequalities 
referred to in the preceding section, 


D, = @ > 0, D. > 0, D; = a> 0, D, > 0. 


The inequalities D, > 0 for p > 4 involve power-sums of orders higher than 
the sixth and are irrelevant to our, purpose. 

The definition (8) of a shows that it is independent of as and ag ; consequently 
A, w, and vy are also. According to (15), Bs involves a; but not ae ; while of all 
the expressions we have considered, only B, and D, are functions of ag. There- 
fore when a, a, --- , as are given any definite values, ag may be chosen to 
make B, a minimum without any regard to the fixed values of u and B;. Now 
(15) shows that B, is a linear function of as with a coefficient which, at the end 
of the last section, we have proved not to be zero and assumed positive. Thus 
B,, which is also positive, is an increasing function of ag. Its minimum will 
correspond to the least value of as consistent with the condition Ds > 0. But 
(21) shows that D, is also a positive linear function of ag with a positive coeffi- 
cient, a. The minimum of ag, and therefore that of B,, require therefore 
that D, = 0. But D, = 0 is exactly the condition that there should be no more 
than three distinct values in the distribution. Since there must be at least 
three distinct values, and since if there are only three they must satisfy (19), 
the theorem is proved. 

The minimum value of B, with respect to variations of as when B; = 0 may 
be found by putting D, = 0 in (26). Designating this minimum by b and 
using (27) we have 


(102 — dds’ 











MAXIMUM OF A FUNCTION 31 


where the numerator is intrinsically positive, and the denominator is positive 
for the class of distributions we are now considering, though we might equally 
well consider the opposite distributions, for which it is negative. We have also 
from (20), 


(30) (x + y)(y + z)\(e +2) = —b. 


Substituting for each of these binomials its value as given by (28), we may write 
this in the simpler form 


(31) zyz=b>O0. 


It was shown at the beginning of section 3 that when there are only three 
values in the distribution, with frequencies k for x, m for y, and n for z, 


(32) a = kmnP* = kmn(x — y)*(y — 2)°(z — 2)’. 


The first two rows of (17) form a matrix such that the sum of the squares of 
its two-rowed determinants is 


(33) mn(y? — 2)? + nk(2 — 2°)? + km(2’ — 9’). 


Since this is equal to the determinant of the sums of products of the rows, 
namely 


&% 
G2 % 


it follows from (12), (32) and (33) that 


ie ance y. (x + 2)° @@t+y 
k(z—y(e«—z)? m(r—yry—z? n(x — zy — z)? 
It is desired to minimize this expression, which is the factor of the variance 
that is independent of the accuracy of the individual observations, while hold- 
ing b = xyz fixed; or to minimize b while holding yu fixed. In either case the 
values of x, y and z are to be chosen to satisfy (28). The relations established 
by the solution of either of these virtually equivalent problems wil! fix z, y, and z 
except for a factor of proportionality, which must then be adjusted to provide 
a balance as satisfactory as possible between random érrors and bias. 


’ 


5. The quintic pias. Effect of 5b.. With any distribution determined in 
this way will be associated its opposite distribution, which will have the same 
minimizing properties so far as the variance and the cubic and quartic com- 
ponents of bias are concerned. The appropriate choice between these two op- 
posite distributions will in general involve the quintic component of the bias. 
At this point we must, for the first time, take account of the errors in the de- 
nominator be of 2. 

Since b; converges stochastically to Eb; , and bz to Ebe , the error 79 = —$b;/be 
converges stochastically (for large samples) to —}Hb,/Eb,. By keeping our 





32 HAROLD HOTELLING 


values of x close enough to we may insure that Ebe differs as little as we please 
from #2 , and hence that the series 


Foo _ * _, Ebb, , (Bbbs)” _ S 
Be Be 83 
converges rapidly. Let us rearrange this series after inserting for Hb, and Eéb, 
their values, so as to obtain a series in ascending powers of a common multi- 
plier g which may be applied to the values of x. We recall that in the expression 
(13) for Eb; , Bs is of the second order in g, B, is of the third order, B; is of the 
fourth order, and so forth. In the same way, we find that 
Edb. = CBs + CiBa +--+, 
where 


A Qa ay 


1 
Cz; =-|a@i G 
a 


a a 4a 
is of the first order, Cy is of the second order, and so forth. Thus in 


baa = BsBs + (Bibs — BsC383/B2) 


+ (BsBs — ByC3Bs6s/B2 — BsC1Bs61/B2 + BsC3B3/B2) + ---, 


the first term is of the second order, those in the first parentheses are of the 
third order, those in the second parentheses are of fourth order, and the re- 
maining terms are of higher orders. 

We have seen that we can choose distributions for which B; = 0. In this 
way we get rid of the second-order term and reduce the third-order terms to 
B.f,. We shall in the next two sections show how, under various conditions, 
to select from among the distributions for which B; = 0 an opposite pair for 
each of which | B,| is a minimum. In choosing between these two opposite 
distributions, the criterion we shall adopt is that the terms of third order and 
those of fourth order shall have opposite signs; for while the fourth-order terms 
may be made much smaller than those of third order in absolute value, still it 
is desirable that they should offset them, in order to reduce the error. The 
terms of third and of fourth orders reduce respectively for B; = 0 to Bg, and 
to B;8; — B.C36384/B2. Our criterion is that these are to have opposite signs, 
and consequently that 


B.B2Bs(BsB28s — BsC's838s) < 0. 


We shall however modify this criterion whenever o is not negligibly small. 
A more precise criterion will be obtained by expanding z;,? in a series of powers 
of 5b2, taking the expectation term by term, and reducing the moments thus 
obtained of orders higher than the second to those of first and second orders by 








- oft CD CP G2 wie ( SG = Ww 


_" 
~~ 


l. 
s 
Ss 
y 






















MAXIMUM OF A FUNCTION 33 






means of the theory of the bivariate normal distribution of b; and be. It is 
then necessary to make some assumption regarding the order of magnitude of 
xz, y and z relatively to N in order to assemble terms of like magnitude in a 
criterion resembling that above but involving o. The appropriate balance in- 
dicated by the results of the next two sections calls for z, y and z to be of the 
order of N~“*. This leads to the following criterion: 


B2( BiBsB28.8s — BiC2Bs8; — CsBguo”) < 0. 


We have seen that By = b = xyz. To evaluate C3; and B;, which latter 
may in accordance with (15) be written 


a GQ a 


1 
Bs = —-|@ a3 4ae|\, 
a 






Qd2 & 


we proceed as in section 3, replacing the second row of (17) by the first 
powers to obtain C;, and replacing the third row of (17) by the fifth powers 
of xz, y and z to obtain B;. In this way we find 









1 1 1 
1 1 
Cs= 5\2 y zi, B= —,/2° y 2. 
x y 2 x y 2° 


Letting =z, D2’yz, etc. stand for the symmetric functions of z, y and z of which 
one term is written in each case after 2, we may reduce these expressions to 


C3 = 22, 
Bs = —Za'y — U2’y? — 2Ez*yz. 
With the help of (28) and (31) we find 


La'yz = ryz=x = rz, 
Za’y’ = (Lary)? — 2Z2*yz = —2brz, 


Lay = Leyla’ — Ua’yz = —bEz. 


Therefore B; = b2z. Substituting these values for B,, C; and B; in the 
last inequality gives the rule: 
Choose that one of a patr of opposite distributions for which 


(35) (x + y + z)Be{b*B, (628s — BBs) — Bs wo?} < 0. 


It will be remembered that 2 is negative for a maximum of f(z), positive for a 
minimum. The other #’s can only be estimated from preliminary experimen- 
tation, or possibly in particular cases from general knowledge or theory. 
Quite different algebraic methods are appropriate to minimizing u with a fixed 














34 HAROLD HOTELLING 


b according to the limitations to be placed on the frequencies k, m, n; the meth- 
ods leading very simply to a solution in one case involve troublesome complica- 
tions in another. We shall deal with two of the leading cases. 


6. The case of equal frequencies. Some experimental situations call for equal 
frequencies for all values of the variable. If k = m =n, then a = N = 3n. 
Let a; = a;/n. Then aj = 3 anda; = =z. Inasmuch as 


(36) zrzy = 0 and ryz = b, 
we may express a2 , a3 and a, as functions of a; and b as follows: 
a, = Lx’ = (a)? — 22ary = ay’. 
a; = 22° = (Z2)* — 32a°y — Gryz; 
and since 2a°y = Zx=zy — 3xyz we have from (36), 
a; = a; + 3b. 
We have also 
a, = Xa‘ = (Z2)* — 422*y — 622°y’ — 1222’yz, 
and since 
Say = Lryrx” — I2’yz, La’yz = ryz=x = ab, 
Za’y? = (Zzy)’ — 2Ea°yz = —2ayb, 
it follows that 
ay = ay’ + 4ajb. 


Therefore 


’ 12 
3 ay ay 


’ ’ 13 
a =n'| a; ay a; + 3b 
12 13 14 , 
a, ai + 3b a, + 4a,b 


Upon subtracting a; times the second column from the third, and a; times the 
first from the second, this becomes 
3 —2a, 0 
a=nbla; 0 38] = —n’b(4a;° + 270). 
-— *® « 
Also, 
God, — az = n’{3(a;* + 4aib) — (ay’)?} = 2n*(a;‘ + 6a;b). 


Hence, by (12), 


2 44 , 
Aa — Ae 2 a, 6a,b 
37 SS age SS ee eer = 
ais , a nb 40,2 + 27b 








MAXIMUM OF A FUNCTION 35 


Differentiating with respect to a; to find a minimum, we obtain 
0 = (4a;° + 27b)(4a;° + 6b) — 12a;(a;* + 6a,b) = 4a;° + 60a; + 1626’. 


The minimum of y, for b fixed, and satisfying the condition 4a;° + 27b < 0, 
which is equivalent to a > 0 since we assume b > 0, is attained when a;° = 
where q is the numerically greater root of the equation 4q° + 60g + 162 = 
that is, 


| 
o 


q = —(15 + »/63/2) = —11.468 626 97. 


The elementary symmetric functions of the values x, y, z composing the dis- 
tribution are 


Zz = a, = (bq)"*, zry = 0, xryz = b. 
Hence z, y and z must be the roots of the equation in u, 
(38) ue — (bg)""u? — b = 0. 
If we put u = (bq)"», 

v—vyt+q =0. 

Calculation gives approximately 
q = —.087 194 396, and for the roots of the equation in », 
.2628, — .3729, — .8899, 


(39) 


numbers which are therefore proportional to the values of the variable that 
should be chosen when the frequencies must be equal. If any values z, y, z 
proportional to these are used, the value (37) of u is 


ie _ 6 Qq+6 1)3,-2/3 
(40) ** “eae 
and is the minimum consistent with any fixed value b of xyz. 

Choice of the factor of proportionality will involve a compromise between 
the criteria of minimum sampling variance and minimum bias. If we ignore 
components of bias of orders higher than the fourth and recall (10) and (11) it 
will appear that the appropriate combined criterion is that 


(41) b’Bs + po” 


shall be a minimum. Putting for yu its value yw’ from (40) and differentiating 
with respect to b gives 





4a’ q'* q+6 
N 4q+ 27 


oo 26" q+6 “i 








2676 + 


b** = 0, 


or 





36 HAROLD HOTELLING 


The product of the three roots (39) is —q. Numbers proportional to them and 
having the product b’ will be obtained by multiplying them by —(b’q)'”, that 


is, by 
2 1/8 1/8 2 1/8 
- 57s —¢)* = o 
(if) (2 iq + =) a (x) ; 


Multiplying (39) by 2.3318 gives numbers 
(42) .6128, — .8695, — 2.0751, 


which must still be multiplied by + [o°/(N8)i]'” to give the set minimizing 
Eb; . The ambiguous sign is to be fixed according to the rule at the end of 
the last section. Thus we arrive finally at the conclusion: 

If the numbers of observations are required to be the same for all the values of the 
variable used, these values should for greatest efficiency deviate from the estimated 
maximizing value by the products of the three numbers (42) by 


e 1/8 
43 a8 
- (=) 


choosing the ambiguous sign so as to satisfy (35). 

The product b’ of the three values is to be substituted for b in (40) and (35), 
and the value of uw thus obtained from (40) is also to be substituted in (35). 
These substitutions yield 


(x + y + 2)B28s(B28s — 4838s) < 0 


as the criterion for choosing the sign in (43). 

The expectation of the square of the error in the estimate of the value 2 of & 
is, according to (9) and (10), given approximately by the ratio of (41) to 463, 
and it is this that will be a minimum when the foregoing rule is followed. The 
minimum of (41) is obtained by replacing b by b’ in (40) and (41), and sub- 
stituting (40) for win (41). This gives 


> 6 3/4 ' hae 
= 4(2 its) (—q)"* pi" 0°”; 


that is, 


(44) E(éb,)” = 4.889 N~**p170°”. 


7. Adjustable frequencies. If the total number N of observations to be made 
can be distributed freely among the values of the variable, the efficiency of the 
experiment can be increased by a proper selection of the individual frequencies 





MAXIMUM OF A FUNCTION 37 


k, m, n along with the corresponding values z, y, z. We shall choose these 
six unknowns, subject to the three conditions* 


(45) k+m+n 

(46) ry taz+ yz= 

(47) ryz = 

to minimize y. The last condition fixes the quartic bias, the preceding one ex- 
presses the vanishing of the cubic bias. It is of course understood that k, m, n 
are all positive, and we shall, as before, suppose initially that b is positive. No 
two of z, y, z can be equal, and it follows that none of them, or of the sums 


of two of them, can be zero while satisfying the second condition. We shall 
lose no generality in assuming that 


(48) z>y>O0>z. 


Furthermore, it is easy to see that x + y, x + z, and y + 2z are all positive. 
Therefore the quantities 


(49) y+2z xr+z2z xrt+y 


"“Gon@e-y’ * @-vno-s ° @-a0-a 


are all positive. From (34) we have 


2 2 2 
t 


(50) w= +- + 
m 


n + 

The values of k, m, n making this a minimum while themselves subject to the 
limitation that their sum is N must if they were continuous positive variables 
be proportional to r, sand ¢t. Of course the frequencies are integers, but we are 


supposing N large, so that the values found by differentiation will be close 
approximations, and we shall disregard this complication. Put therefore 


(51) r=kp, s=mp, t= Np, 


where p is a multiplier which evidently is not zero. If we use these equations 
to eliminate r, s, t from yu we obtain, with the help of (45), 1 = Np. But if we 
use them to eliminate k, m, n from (50) we have instead, 


w= (r+s-+ tp. 
Now from (49), 
(52) r+t=s, 
4 The condition (47) is here used instead of (31), from which it differs by the introduction 
of the negative sign, because it simplifies the argument of this section slightly to have the 


quantities (49) positive. There is no essential difference, since we are seeking a pair of 
opposite distributions. 





38 HAROLD HOTELLING 


so that 4 = 2sp. Therefore Np = 2s, and finally 1» = 4s°/N. Therefore yu is 
a minimum when the positive quantity s is a minimum. In the expression 
(49) for s we substitute from (46) and (47) 


2 
(53) r+2z2= —22/y = d/y’, | 
(x — y)(y — 2) = (© + 2)y — zz — y’ = QW/y — x’, 
so that 
si b 
(2b — y?)’ 
Since y, s and b are positive, this shows that y* < 2b. The value of y on the 


interval from 0 to 2b making s a minimum is found by differentiation to be 
2°"*>*, Substituting this in (53) and (47) gives 


(54) 


x + z= ro", tz = — 23578 
whence 
(55) 2 = (b/2)""114+ +73), y= (0/2), z = (b/2)'*(1 — V3). 


From (45), (51) and (52) it is seen that k + n = m = N/2. Thus half the total 
observations are to be concentrated on the middle value. From (51) and (49) 
we have also 


wherefore 


With (55) this shows that 


= N(2— V/3)/8 m=N/2, n=N(2+ V3)8 


(56) 
= .03349 N, = .46651 N. 


We have seen that » = 4s°/N. Substituting in (54) the value found for y 


“35-1373. Therefore the minimum of u for a fixed value of b is 


gives s = 2 
(57) u = (16/9N)(2/b)*” 


Inserting this in the expression (41) for the total expectation of the ae 
error and then differentiating with respect to b gives 


(58) b = 274g- 18-88-94 3/4 





MAXIMUM OF A FUNCTION 


When this value is given to b, (41) becomes 
(59) 3.8207N "8170". 


The greater efficiency of experiments with the frequencies (56) and the corre- 
spondingly adjusted values z, y, z, in comparison with the case in which the 
frequencies must be equal, corresponds to the smaller coefficient in (59) than in 
(44). To obtain as great accuracy with equal frequencies as with adjusted 
ones it is necessary to have more observations, in a ratio obtained by equating 
(59) with (44) after inserting different symbols for N in the two cases. In this 
way it is found that the number of observations required with efficient distribu- 
tion of the frequencies is almost exactly 72 per cent of the number required 
when the frequencies are equal, if the values z, y, z are in each case given their 
most efficient values. 
Substituting (58) in (55) gives the numbers 


(60) 2.1520, .7877,  —.2110, 


multiplied by (43), with a change of signs if necessary to satisfy (35), as the 
values z, y, z of the variable to be used. The more concentrated character of 
this distribution with adjustable frequencies is emphasized by the small propor- 
tion, less than 33 per cent, of the frequencies (56) that pertains to the value most 
remote from the tentative maximizing value. 

When (58) is substituted in (57) and, with the result, in (35), this inequality 
reduces to exactly the same form as that obtained in the preceding section for 


fixing the sign of (43). 


8. Introduction to the two-variable problem. Functions of two or more 
variables are of greater practical importance than functions of one variable. 
The recent work on factorial experiments [8] makes it clear that in the experi- 
mental determination of maxima of functions of several variables, considerable 
improvements are possible over the practice of trying the effect of variations 
in only one variable at a time while holding the others constant. It seems likely 
that the methods worked out in the previous sections for experimenting with 
one variable are capable of generalization. However certain difficulties enter 
which have not yet been surmounted. The object of the present section is to 
indicate something of the nature of the problem of extending the foregoing 
results to two variables, z and y. 

Let us suppose that a quadratic regression equation, 


Z = boo + dir + bay + 4 (boox” + 2buzy + bay”), 


will be fitted by least squares to observations of z = f(z, y) based on N combina- 
tions of x and y, each of which represents a point in a plane. Since there are 
six coefficients to be determined, there must be at least six distinct points 


















40 HAROLD HOTELLING 


(t1, 41), +++ , (6, Ys). The coefficients in the normal equations may be written 
ax = Szx’y‘, so that ao = N. The determinant 





| doo a0 Ao Goo Ay =A 
Qy0 Qo A G30 An Arp 
a1 Qi Ao G21 Ai 3 | 
a= 
20 430 a1 Q4o 31 Age 
ay dai aA G31 G22 Qi3 
Ao2 Ay2 Aos G2 As Anu 





must not vanish. Let the function under investigation be 
f(x, y) = TZ yx’y*/(G + ky)! 


and suppose that Bio = 0 = Bo , so that the origin is the point sought at which 
the first derivatives vanish. We shall assume that 


B = Babe — Bi > 0, Boo < 0, 


implying a definite maximum. The estimates x, yo of the maximizing (or 
minimizing) values obtained by differentiating Z are 


tH = (b11b01 = bo2bio) / b, po = (birbi0 = beobo1) / b, 


where 
b = beoboe —_ b3, ° 


For large samples and values of x and y taken not too far from the origin, b will 
approximate ‘to 8, and 2» and yo respectively to 


(Birbo1 — Bozbi0)/B, (Bubic — Beobn)/B. 


Some means is needed of combining into one the two desiderata of minimizing 
the errors z) and yo. A combined measure of these deviations is 


Boots + 2Butoyo + Borys - 





This expression is constant except for terms of higher order when 2 and %, 
while remaining small, vary in such a way that f(z, y) maintains a constant 
value. Substituting in it the approximate values of 2» and yo gives 8’ times 


Boabio — 2Bubrobor1 + Beobdr . 


The expectation of this measure of error may be separated into two parts by 
means of the formulae for the variances and covariance, 


Shr. = Ebjo ~~ (Ebw)’, Fbi0b01 -_ Eb.obo. a (Eb1o) (Ebo1), ete. 



















MAXIMUM OF A FUNCTION 41 


One of these parts is a generalized sampling variance, 
2 2 
Boz bo ini 2811 Fb, obo: + B20 obo, 5 


and tends to zero with order N~ as N increases provided the values (xz , yx) are 
fixed. The other part, 


(61) Boe(E bio)” — 281:(Ebi)(Ebo1) + B20(Ebo:)”, 


is a bias which does not tend to zero as N increases, but which may be kept 
arbitrarily small, at the expense of the sampling variance, by restricting the 
values (x; , yx) to be sufficiently small. This expression is a negative definite 
quadratic form in Eby and Ebo; , and therefore cannot be zero unless both these 
components of bias vanish separately. 

We may proceed as in paragraph 2 to express Eby and Eb in terms of the 
coefficients of f(x, y) of orders higher than the second, among which those of 
third order will be of leading importance. In this way it may be shown that, 
if we neglect terms in f(z, y) of orders higher than the third, Eby and Eb are 
given by the ratios to a constant multiple of a of determinants obtained from 
a by replacing respectively the second and the third columns by the column 


B30ds0 + 3821021 + 3Bi2d12 + Bosdos 
B30Q40 + 3821031 + 3Bi2d22 + Bosdis 
B30ds1 + 3Be1d22 + 38i2013 + Bosdos 
B30ds0 + 3Be1d41 + 3812032 + Bostes 
B30041 + 3Be1e32 + 3812023 + Bosra 
Bsod32 + 3Be1d23 + 3812014 + Bosdos « 


It is desirable to select a distribution of points (a, , y.) such that these compo- 
nents of bias will vanish, no matter what may be the values of B30 , B21 , Biz and 
Bos. For this it is necessary and sufficient that all the determinants vanish 
that are obtained from these two by replacing the column written above by the 
terms in it that multiply any one of the four 6,’s. The single-variable analogy 
suggests using a distribution having the smallest possible number of points, 
which in this case is six. Let us now take N = 6. The eight determinants will 
all be multiples of 


ln nm t UY yi | 


1 2% Ye 16 TeYe ye 


To save space we shall indicate determinants of this character merely by writing 
a single row without subscripts, thus: 


P=|1 24 y x& zy y' |. 











42 HAROLD HOTELLING 






If we define 


Anp=l|l zy y 2 zy yl, 


Ap=|1 2 ays 2 sy y'|, 


and multiply each of these determinants for which j + k = 3 (j, k = 0, 1, 2, 3) 
by P, columns by columns, we shall have exactly the determinants whose van- 
ishing is the condition for nullification of the cubic bias. If we multiply P by 
itself in the same way we have P* = a. Therefore P ~ 0. Therefore the re- 
quired condition is that the distribution satisfy the eight equations 

Ay =0, An=0, Anz=0, Avs =0, 

Ay =0, An=0, Aw=0, Acs =O, 
and the inequality P ¥ 0. 

In seeking distributions nullifying the cubic bias we have twelve unknowns 
21, **+,2%6, Yi, °*+, Ys Which must satisfy these eight equations. This sug- 
gests that we give arbitrary values to four of them and then solve for the other 
eight by straightforward elimination. Unfortunately, since the eight equations 
are each of the tenth degree, reducing to the ninth degree when coordinates 
of two of the points are given numerical values, a straightforward elimination 
would seem to lead to an equation of degree 9° = 43,046,711. The number 
of algebriac operations in performing the elimination, solving the equation for 
one of the unknowns, substituting back, and solving for the others, would be a 
large multiple of this number, and would doubtless be sufficient to occupy a 
large and efficient computing project for many millenniums. At the end of this 
period it might be found that the roots corresponding to the original arbitrary 
values chosen were all complex or made P = 0, and were therefore unusable. 
Thus indirect and less elementary methods are called for, and some qualitative 
investigations of such distributions, if they exist (which is not certain), are in 
order. 

The set of conditions as a whole is invariant under all non-singular homogene- 
ous linear transformations of z and y, as is easily proved by making linear 
combinations of the columns of each of the determinants A;,, Aj, and P, and 
by making linear combinations of these determinants themselves. These 
linear transformations leave the origin invariant. They have four degrees of 
freedom, which is exactly the right number to take care of the excess of un- 
knowns over equations. This points to the possible existence of a finite number 
of fundamental solutions, from which all solutions may be obtained by linear 
homogeneous transformations. Geometrical properties of the configuration will 
be represented by invariants under linear transformations. Thus the condition 
P ~ 0 means that the six points must not all lie on any conic section. From 
this it follows at once that no four of them can lie on a straight line, since this 
line, with the line through the other two, would constitute a degenerate conic. 
As a matter of fact, we can go further and prove that no three of the points 





MAXIMUM OF A FUNCTION 43 


may lie on a straight line. In the proof of this and other properties of the dis- 
tribution it is convenient to use the arbitrariness provided by a linear trans- 
formation to pass the axes (which may be oblique) through any two of the six 
points, and then to adjust the scales of measurement so that the coordinates of 
these points become (1, 0) and (0, 1), except that one of them might conceivably 
be the origin. If three points are collinear, their line can be taken to be the 
z-axis if it passes through the origin, or the line y = 1 if it doesnot. Even with 
the help provided by such procedures the proofs are rather long, though straight- 
forward. We shall content ourselves here with stating, without proof, the fol- 
lowing properties necessary for sets of six points for which P ¥ 0 and all com- 
ponents of the cubic bias vanish: 

No three of the points can lie on a straight line. 

No two straight lines through the origin can contain four of the points. 

No four of the points can lie on the vertices of a parallelogram. 

The set cannot consist of the origin and the vertices of a regular pentagon with 
center at the origin. 

These conditions have been established by calculations of a rather straight- 
forward and laborious sort, too long to be reproduced. 

If z = 2, + iy, and % = 2 — ty, the conditions P ¥ 0, Aj, =0= An, 
may be written 


lizz222| <0, | 122z2e2ze | =0, | 1222 2oze | = 0. 


9. Some further unsolved problems. Since it is useful to demarcate the 
frontiers of knowledge by pointing out what lies a little outside them as well 
as what is within, a few of the many questions may be mentioned which this 
paper falls short of answering. Besides the extension to two variables men- 
tioned in the last section, and to an arbitrary number of variables, it is desirable 
that the whole theory should be developed from an exact, or small-sample, 
point of view rather than on the basis of the large-sample approximations used 
here. This however appears to be an extremely large enterprise. A simpler, 
but still quite difficult, problem is to modify the criteria obtained in paragraphs 
6 and 7 so as to fit problems of economic experimentation, such as those of 
determination of maximum monopoly profit or minimum cost, in which the cost 
of each observation consists largely of the lost profit, or excess cost over the 
minimum, occasioned by the deviation from the value sought. In such a case 
the limitation of cost replaces the limitation of the total number of observations. 

Another important problem is to take account of the inaccuracy of the pre- 
liminary information on which the design of the experiment is based, and to 
utilize the relations thus involved to design efficient sequences of experiments. 

Determination of limits of error in terms of the maxima over an interval of the 
derivatives of f(x) should be a fairly straightforward problem in analysis and 
have practical importance. With this are associated various problems dealing 
with maxima of functions having discontinuities in the first or higher derivatives 
at or near the maximum. 








44 HAROLD HOTELLING 





An important extension would deal with the case in which the maximum is 
estimated from a least-squares polynomial of degree three or more. This might 
be a connected with the difficult wider problem of deciding on the degree of a 
polynomial to be fitted in a particular case. 


























10. Summary. In determining the value é of x for which f(x) is a maximum 
or minimum, a quadratic polynomial may be fitted to observations made for 
chosen values of x. The errors considered are of two kinds: sampling errors 
resulting from the inaccuracy in each observation, which diminish as the number 
of observations is increased, but increase if the values of x are chosen too close 
to the value sought; and biased errors resulting from the fact that f(z) is not 
truly quadratic, which do not decrease when the number of observations in- 
creases with a fixed set of values of x, but do decrease when the deviations of x 
from the value sought are reduced. The biased errors may be separated into 
components corresponding to the third, fourth and higher powers of x — é in 
the expansion of f(x), and these components will ordinarily be of diminishing 
importance as we go on in the sequence. However it is possible to choose values 
of x making the cubic component zero and the quartic component at the same 
time a minimum. Such a set consists of only three values of z. These values 
may be further adjusted to minimize the expectation of the square of the total 
error in £, as far at least as the term of fourth order in the bias, by a proper 
balance between the sampling variance and the quartic bias. The values of z 
satisfying these conditions, measured from the true maximizing or minimizing 
value & are the products of [o/(N6:)]'* by the values u in the table below. 
Since the root will usually be extracted by logarithms, the common logarithms 
of the values are given. The first set are the most efficient when the frequencies 
must be equal. The second set is appropriate when the frequencies are made 

. proportional to the quantities in the last column; in this case only about 72 per 
cent as many observations are required for any specified accuracy as when the 
frequencies must be equal. The approximate expected squared errors in the 
estimates of £ in the two cases are given respectively by formulae (44) and (59). 
All these results are approximations of the kind appropriate to large numbers of 

observations. 









Equal frequencies Adjustable frequencies 









logio u logio u Frequency 


— .6128 — .21267 — .2110 — .67572 .46 651 N 
. 8695 — .06071 1877 — .10364 | .50 000 N 
2.0751 .31704 2.1520 | 33284 | .03 349 N 








The signs of u should be reversed if B284(G28; — 48384) > 0. Here 6; is the 
coefficient of (x — £)* in the expansion of f(x), and o° is the error variance of 
an individual observation. For designing an efficient experiment it is necessary 







_ a 


MAXIMUM OF A FUNCTION 45 


to have some knowledge of these quantities. It may be gained from preliminary 
experiments of smaller scale. 

A suitable preliminary experiment, where knowledge of the function is ex- 
tremely scanty, might consist of a fixed small number, greater than one, of ob- 
servations on f(x) corresponding to each of a set of six or more values of z in 
arithmetic progression covering an interval that includes the value ~ sought, 
and selected with a view to getting & in the center of it as nearly as possible. 
A polynomial of the fifth degree at least should be fitted by least squares, in 
which process all the quantities desired for the design of the later, larger experi- 
ment can be estimated, together with their accuracies. Since the values of z 
are taken in arithmetic progression, the fitting can be carried out with extreme 
ease by the method of orthogonal polynomials. 

Numerous subsidiary questions promise to have both practical importance 
and mathematical interest. 


REFERENCES 


[1] Fevrx Bernstein, Hartmann and Bauer Handbuch der Erblichkeitslehre, Vol. 1, Berlin, 
1928. 

[2] S. O. Ricg, ‘‘The distribution of the maximum of a random curve,’’ Am. Jour. of Math., 
Vol. 61 (1939), pp. 409-416. 

[3] G. W. E_men, ‘“‘Magnetic alloys of iron, nickel and cobalt in communication circuits,’ 
Elec. Eng., Vol. 54 (1935), pp. 1292-1299. 

[4] E. E. ScoumMacHER AND A. G. Soupen, ‘‘Some alloys of copper and iron,’’ Metals and 
Alloys, Vol. 7 (1936), pp. 95-101. 

[5] P. C. Manavanosis, ‘‘A sample survey of the acreage under jute in Bengal, with dis- 
cussion on planning of experiments,’’ Proc. 2nd Ind. Stat. Conf., Calcutta, Statis- 
tical Publishing Soc. (1940). 

[6] H. G. Romie, Allowable average in sampling inspection, Columbia Univ. thesis, privately 
printed, 1939. 

[7] J. V. Uspensxy, Introduction to mathematical probability, New York, McGraw-Hill, 
1937. 

[8] R. A. Fisuer, The design of experiments, London and Edinburgh, Oliver and Boyd, 
1935, Chap. 6. 








ON A STATISTICAL PROBLEM ARISING IN ROUTINE ANALYSES 
AND IN SAMPLING INSPECTIONS OF MASS PRODUCTION 


By J. NEYMAN 


University of California, Berkeley, Calif. 


CONTENTS 

Page 
1. Introduction.............. ih ial La dies hrachehiaimcishrnin ah alta hve katie wb wi naha We ok oes 46 
2. @rage@ent hypotnesis 7 to be tested ............. 2. icine cee ss iicdacecessessweeeevcn 48 
i ie vdnm dienes aoe eteduwas esa envs sae smeunsws 48 
4. Regions similar to the sample space with regard to a, & , &2, --- , Ew............. 55 
6. Phe eet of hypotheses AMCIHAtIVe TO Fw... ccs cess sccsccseesseeeene 61 
6. The best critical region for testing H against a particular alternative............. 65 
ees CN OI on ne cea sec eset seddwsvnecedesavetessacwe 66 
ee TIN ONSEN Conoco 6 55:5 GWG la td wills 06 ora sen OS a Biansid da Sia aren CSW eel Sata 70 
eee ea ia baa Wanlink idee kei adm Wain nank whan whee 76 


1. Introduction. The words “routine analyses” are used to denote the analy- 
ses performed by laboratories, frequently attached to industrial plants, and dis- 
tinguished by the following characteristics: (1) All the analyses or measure- 
ments are of the same kind, for example, are designed to measure the sugar 
content in beets or to determine the coordinate of a star. (2) The analyses are 
carried out day after day using the same methods and the same instruments. 
(3) While all the analyses are of the same kind, the quantity measured varies 
from time to time and each such quantity is measured repeatedly n times, 
where n represents some small number, 2, 3, 4, 5. 

As an illustration we may consider the routine analyses of sugar beets per- 
formed in the process of selection and breeding. A small section is cut out of 
each of a great number of sugar beets expected to be suitable for further breed- 
ing. It is crushed and its juice extracted to determine &, the sugar content of 
each particular beet. From the juice available from each beet n samples are 
taken and a determination of the sugar content is‘made from each. Thus, if 
£; represents the sugar content of the section from the 7th beet and there are 
N beets, the laboratory will have to make nN analyses with their results 2;,1 , 
Xi2,-+*+, Lim, representing the measurements of the same quantity &. Ob- 
viously the sugar content é; referring to the 7th beet need have no relation to 
that of any other jth beet. 

An essential point in the above description is that the number of measurements 
referring to the same quantity £; is usually very small. For example, the 
quantitative analyses of urine in certain clinics are performed only twice for 
each patient, so that n = 2. Frequently, various practical considerations make 


46 








A STATISTICAL PROBLEM 47 


it impossible to increase this number n of analyses intended to measure the same 
quantity &;. 

The smallness of n introduces difficulties in estiminating &. It is usual to 
consider 2;,1, 2i,2, °++ , Zin aS independent variables, varying normally about 
€; with an unknown standard error o;. If they have to be used to estimate é; , 
then the confidence interval [1]' for é; will be determined by the familiar formula 


(1) | 2. — S8ita(n) < & <2. + 8ita(n), 


where z;. denotes the mean of the 2;;, 


(2) si; = 2 (xig — 24.)’/n(n — 1) 
fea 

and t,(n) is Fisher’s t corresponding to the number of degrees of freedom n — 1 
and to the chosen confidence coefficient a. It is known [2] that if the estimate 
of ¢; is based only on its direct measurements 2;,1 , Zi,2, +--+ , Zin, then the con- 
fidence interval (1) can not be made any smaller; in fact, formula (1) gives the 
shortest unbiased confidence interval for ;. But if we try to substitute appro- 
priate numbers in (1) we get disconcerting results. Namely, if n = 2 and 
a = .99, then t.(n) = 63.657. If n is increased, the value of t,(n) decreases 
rapidly but for n = 5 it is still very considerable, ¢.(5) = 4.604, and consequently 
the numerical confidence interval determined by (1) is frequently so broad that 
it is devoid of practical value. 

The general conclusion is that, if n cannot be increased, satisfactory estimates 
of £; can only be obtained when they are based on something else in addition to 
the direct measurements 2:1, Ti2,-°++, Zin. This point was first noticed by 
“Student” [3]. His method of avoiding the difficulty consists in assuming that 
the accuracy of measurements performed in the same laboratory is constant 
in time, so that 6, = 0; = --- = oy =. If this is true, then sg = 2s;/N will 
be an unbiased estimate of the variance of z;; , based on N(n — 1) degrees of 
freedom. If the past experience of the laboratory is of any size, as measured 


by N, then the product N(n — 1) will be of considerable size and the confidence 
interval for &; 


(3) Li. — Sota(N(n — 1) +1) < & < 2G. + Sota(N(n — 1) + 1) 
will be much more satisfactory than (1). 

The problem which arises is whether we are entitled to assume that o, = 
og = --- = oy. The first study of this problem seems to have been made by 
Przyborowski [4] in a paper written in Polish. His findings, subsequently re- 
ported [5] in English, show that, at least in certain cases, the accuracy of routine 
analyses is quite difficult to keep constant. If it is not constant, then the rela- 


tive frequency of the cases where formula (3) gives correct statements about &; 
will generally be different from the expected a. 


1 Figures in square brackets refer to the literature quoted at the end of the paper. 














48 J. NEYMAN 


The procedure employed by Przyborowski to test whether o; = o2 = --+ = on 
consisted in considering the quantities v; = (n — 1)s; and applying the x’ test 
to see whether they follow the -ame x’ distribution with n — 1 degrees of freedom 


(4) p(v) as cyt") o—tele? 


with an unknown o. 

Just this point is to be the main subject of this paper. The x’ test was de- 
vised by Karl Pearson with no particular set of alternative hypotheses in view. 
As a result we may expect that in many cases other tests may be devised which 


would be more powerful. A number of such cases are already on record [6], 
'7], [8]. 


2. Statistical hypothesis H to be tested. We shall consider the case where 
we can observe the particular values of Nn random variables z;,;, 7 = 1, 2, 
---,N;j7 = 1, 2, --- , , and we know that 2;,; is independent of x, for 7 # k 
and that 


1 7 =f > (24,4-€s)2/0% 
(5) p(x, 4 Sind = ( =) e i=l , 
2r 


o% 





with unknown values of £; and o; > 0. The hypothesis H to be tested is that 
0, = 02 = --- = oy = o without specifying, however, the actual value of co. 
It will be noticed that this hypothesis has already been treated by a number 
of authors [9]-[17]. The need for considering it again arises from the fact that 
previously it was tested against the set of alternatives presuming that the o, , 
02, +++ on, Were positive constants having any values whatsoever. It seems 
to the author that, in the present case, the set of alternatives should be different. 
This will be explained in the next section. It follows that while the hypothesis 
tested is the same as in the papers quoted above, the problem of testing it is 
quite different. 

Let us denote by E the whole set of Nn observable variables. If H is true 
then their elementary probability law will be 


N nn 
—§ D D (25,j5-&s)2/0? 
s—1 j-1 


(6) p(E | H) = <a) "e 


3. General problem of similar regions. The development of the test will 
follow the general lines explained elsewhere [18], [19], [20]. Denoting by W the 
Nn dimensional space of the z;, ;’s, we want to determine a region w in W having 
the following properties: (a) if the hypothesis tested is true then the probability 
of E falling in w shall have some fixed value chosen in advance, e.g., € = .05 or 
¢ = .01. This probability is known as the probability of an error of the first 
kind. (b) If H is not true then the probability of EF falling in w as determined 
by one of the alternative hypotheses (that we assume likely to be true when H 
is false) shall be as large as possible in a sense that requires further explanation. 

















A STATISTICAL PROBLEM 49 


The probability with which this condition is concerned is a complement of the 
probability of an error of the second kind. Once the region w is chosen it will 
be used to test H in this way: if E falls within w, then H will be rejected. 

In the present section we shall deal only with ways of satisfying condition 
(a). The problem is similar to the one recently described by Hotelling [21]. 
The difficulty is that, if H is true, the probability law of E is given by (6) and 
contains N + 1 unspecified parameters, “nuisance” parameters as Hotelling 
very appropriately calls them. If we take just any region w then it is most 
likely that the probability of E falling in it will vary with different values of 
o, &,-°--,&. Asa matter of fact, if we want the test to be absolutely most 
powerful, or at least relatively so, we must determine not just one single region 
satisfying (a) but actually all such regions or some broad family of them. From 
these we shall then select one which seems most satisfactory from the point of 
view of (b). 

Systematic methods of determining regions of the above kind have already 
been considered [18], [20], [2]. In these publications they are called “similar” 
to the sample space W. The reason for this term is that the whole space W does 
possess the required properties with e = 1. In fact, whatever be the values of 
the nuisance parameters, co, $,--- , vy, the probability of E falling within W, 
as calculated from (6), is perfectly determined and equals 1. Our problem is 
to find a region w, part of W, with similar properties for 0 < « <1. However, 
in many cases no such regions exist [22]. 

The general methods in the above publications are applicable in the present 
case. However, a recent paper by Cramér and Wold [23] allows a slight im- 
provement in presenting the matter. As-this is a little involved, it seems de- 
sirable to take up the whole problem and present it anew. 

Consider then the general case where the probability law of some m observable 
variables y: , y2,--- ,Y¥m, say p(E | 61, --- , ,), as specified by the hypothesis 
tested, depends on s nuisance parameters 6,, 62,---,9.. Our problem will 
consist of determining the necessary and sufficient conditions for a region w to 
be similar to the sample space with respect to all these parameters. We shall 








assume that the probability law p(E | 6,, --- , 6.) satisfies certain limiting con- 
ditions. 
Let 
_ 9log p 
(7) a= 
_ & log p 
(8) Hi 90;00; 
Assume that for all values of 7 andj = 1, 2,---,8s 
(9) gi = Aigt+ 2X Biju Pe 


where the coefficients A;,; and B;,;,, are independent of the observable variables 
E. Assume also that the probability law p(E | 6, --- , 6.) permits indefinite 





50 J. NEYMAN 


differentiation under the sign of the integral taken over any fixed region w in W. 
It is easy to check that the probability law (6) satisfies all of these conditions. 

In order to find the necessary conditions for the region w to be similar to W 
with respect to 0; , 02, --- , 6, assume that w is actually similar and that, conse- 
quently, 


(10) P{Eew|hi, ~2+, Os} = / aus [ p(E | 6, ee 6.) dy: -++ dm = € 
for all possible values of 6,, 62,---,96,. It follows that the derivatives of all 


orders with respect to 0,, 62, --- , 0, taken from the left side of (10) must be 
identically equal to zero. But we have 


0 
Bf [ pal, +00 dys ++ dm 
0 
(11) is [ [2 e@la,---, dan. 


- | [op@la, wes, 0) dys - 


for? = 1, 2,---,s. Similarly, using (9) 


1, 
a 
wa, | | Pla, 6,) dy: --+ dim 


= [+ [ (oer t+ Au + Biswor) PEL, +++, 0) dyn ++ dm = 0 


(12) 


Using (10) and (11), the last identity will be reduced to 
(13) :j vee [ ¢i¢;D(E|61, --- 0.) dy: --- dym = —A;,; fori,j = 1,2,---,8 


where the right side does not depend on the particular region w, provided that 
w is similar to the sample space. Considering the identities (11) and (13) 
which were obtained by differentiating (10) twice, we may guess what will 
happen if we differentiate (13) again and again. We may assume, in fact, that, 
whatever be the non-negative integers k, , ke, --- , k., we shall obtain 


(14) Pf + [Th ep la, ++, 0) dys «++ dum = Mla, Ia, ++), 


where M(k,, --- , ks) is independent of the particular region w, provided that w 
is similar to the sample space with respect to all of the 6’s. Assume that this is 


found for all k’s such that >> k; < K; also assume that the sum of the k’s in 


t=1 


(14) is exactly K. Differentiating with respect to 6;, we obtain 
1 : ; 8 , 8 7 
ef ean [ (er II of + I] gat % vi" yusb p(B ++, 0.) dy: +++ dym 


€ 


(15) . 
= ag, MM» ooo sD. 





it 
s) 
ll 
t, 


A STATISTICAL PROBLEM 51 


Because of the particular form of ¢;,; , the second expression in the curly brackets 
under the integral is a polynomial in the ¢’s of order not exceeding K. According 
to the assumption made, this expression multiplied by p(E | 61, --- , 0.)/e and 
integrated over w gives a result which is independent of w. As the right side of 
(15) is also independent of w, we conclude that 


1 | fe 
i  f --. LL PCE 6, + 8) o++ dt 


= M(ky, +++, kj; +1, +++, ke) 


is also independent of the particular similar region chosen. We have seen that 
(14) is true for K < 2 and that if it is true for K it is true for K + 1, that is, 
it is true in general. 

We may now sum up our findings: if w is a region similar to the sample space 
with respect to all of the 6’s and if « denotes the value of the integral (10), then, 
whatever be the non-negative integers k; , ke, --- , k. , the value of the integral 
on the left side of (14) is independent of the particular region w chosen. 

As the whole sample space W is also “similar” with e = 1, it must satisfy this 
identity. This allows us to determine the M’s, namely 


(17) J --. [ Wetreis,, 22+, 0.) dy: +++ dym = M(ki, «++, ky). 


It is obvious that the necessary condition above is also sufficient. If a region 
w is such that (14) holds for all systems of non-negative integers then all the 
derivatives of (10) must be identically zero; thus the left side of (10) is inde- 
pendent of 0,, 02, ---,0. :, 

It will be useful to interpret the above conditions as follows. We start by 
noticing that the left side of (17) represents the product moment of some speci- 
fied order of the ¢1, g2, --- , ¢. considered as random variables. We shall call 
it the absolute product moment. We will now interpret the left side of (14) 
as a product moment also. For this purpose we shall define a new elementary 
probability law of the y’s to be denoted by p(E | w, 6, --- , 6.) and described 
as the relative probability law given w. We shall write it as 


(18) P(E | w, 1, +++, 0.) = = p(B |, «++, 6) 


for all of the points E included in w and 
(19) P(E | w, 1,+++, 6) =0 


for all other points. With this definition the left side of (14) appears to be the 
expectation of the product ¢j' --- gs" calculated from the relative probability 
law of the y’s given w. We will call it the relative product moment given w. 
The final result can now be stated as follows: 
For a region w to be similar to the sample space with respect to 6; , 62, +--+ , 4% 
it is necessary and sufficient that all the relative moments and product moments 





52 J. NEYMAN 


of ¢1, ¢2, --+ , ¢ Shall equal the corresponding absolute moments and product 
moments. 

In order to make the method of constructing similar regions according to the 
above conditions clear we recall the procedure involved in the calculation of the 
probability laws of any given set of random variables. 

Assume then that the elementary probability law of the original variables is 
given. Fix some values of the parameters 6; , 62, --- , ., denote the resulting 
probability law by p(E), and consider the problem of finding the elementary 
probability law of ¢1 , gz, --- , ¢s considered as functions of the y’s. We shall 
assume that none of the g’s can be expressed as a function of the others not 
involving the y’s explicitly so that the matrix 


(20) 


is non-singular. In these circumstances it is possible to select m — s functions 
of the y’s say We41, Wer2, -*- , Wm Which have continuous second derivatives such 
that the formulae 

Zi = ¢ +=1,2,---,8 
(21) ; 

2; =; j=stil,---,m 
determine a one-to-one transformation of the space W of the y’s into the space 
W’ of the z’s. If w denotes any region in W then it will be transformed into a 
perfectly determined region w’ in W’. If E’ denotes a point in W’ then the 
probability of E’ falling in w’ will be identical with that of E falling in w. Thus 


(22) P{E’ew’} = P{Eew} = | wee / p(E) dy: --- dym. 


Letting J be the Jacobian of the y’s with respect to the z’s in the transformation 
(21) and using the known formulae for transforming multiple integrals, we have 


(23) PH’ ew'} = [ --. [pe | rides --- dew 


where p(E)|z denotes the result of substituting the expressions for the y’s in 
terms of the z’s as obtained from (21) into p(£). It follows that, whatever be - 
the region w’ in W’, the probability of E’’s falling in it is obtained by integrating 
the function p(E)]z | J| over w’. But this means, according to the usual 
definition, that the product p(E)]z | J | is the elementary probability law of 
the z’s. Denoting it by p(E’) = pla, --+ ,2m) we have 


(24) p(B’) = p(E)\e° | J |. 





A STATISTICAL PROBLEM 53 


Now, to obtain the joint probability law of 91, g@,---,@. or that of 2a, 
Z., +++ ,2, we must integrate p(E’) for all the other z’s between their extreme 
limits, formally between — © and + © for each of the variables concerned, 


(25) ploy --y9) = [oe [pO dean +++ dem. 


This procedure will be applied when calculating the absolute probability law 
of the ¢’s and also the relative one given w. The only difference will be that in 
the latter case we shall have to start with (18) and (19) instead of the original 
probability law. The space W’ and the transformation (21) will be the same 
in both cases. It is important to be clear about the difference between the two 
cases. This is connected with the difference between p(E | 6,,--- , 6.) and 
p(E | w, 61, --+ , 4s) of (18) and (19). The latter is proportional to the former 
at any point E within the region w but is zero outside of w. As mentioned 
above, the integrations for 2.41, Zs42, +++ ,2m in (25) should extend formally 
from — © to + for each variable. However, the probability law p(H’) may 
equal zero within certain parts of this range. Fixing any system of values 
2: = 9;,forz = 1, 2, --- , s, is equivalent to fixing a hypersurface in the space W 
and considering the intersection of planes z; = constant in the space W’. De- 
note them by W(¢) and W’(¢), respectively. If we shift the point E or E’ 
along W(y¢) or W’(¢) respectively, the variables z; = y;, for 7 = s + 1, 
s+ 2, ---,m will assume a certain set S(¢) of systems of values. When calcu- 
lating the absolute probability law of 1, --- ,¢ this set S(y) will be the real 
region of integration in (25); outside of it the function under the integral sign 
will be zero. On the other hand, when calculating the relative probability law 
of 1, --- ,¢. given w, the function under the integral (25) is zero as soon as 
the point E moves outside of the region w. Denote by w(g) that part of W(¢) 
which is included in w and by w’(¢) the corresponding part of W’(y). So, the 
absolute and the relative, given w, probability laws of g:, --- ,¢. can be ob- 
tained by using the formulae 


(26) pler, «+500 = fo [pO dens +++ den 
#. 


(27) p(¢r, +++, Ge| Ww) = f eee L p(E’) dzs41 eee 2m. 


Now the method of constructing regions similar to W with respect to 6, 
62, --- , 0, is clear: to construct any such region it is necessary and sufficient 
to select for each of atl possible systems of values of ¢1, g2, --- , ¢. &@ part w(y) 
of the hypersurface W(y) and to combine all these parts. The selection of w(y) 
is arbitrary save for the restriction that the probability law (27) have all its 
moments equal to those of (26), identically in the @’s. This last condition will 





54 J. NEYMAN 


certainly be satisfied if w(y) is so selected that for almost all systems of values 
of ¢1, 2, *<** Ge 


(28) P(gi, +++, Ge | W) = D(gi, +++ 5 ge) 


for all values of the 6’s. 

By selecting w(¢) in all possible ways that satisfy (28) we obtain an infinity of 
regions similar to W with respect to 6; , 62, --- ,6,. They form a family which 
we shall denote by F(e). However, it is known that in general all the moments 
of p(gi, --- ,¢s | w) and p(¢i, --- , gs) may be identical without the two proba- 
bility laws being equal almost everywhere. In such cases, the family F(e) will 
not exhaust all the similar regions. It is important to be able to state whether 
or not F(e) contains all the similar regions. To ascertain this we may use the 
conditions of Cramér and Wold [23] which are sufficient for the determinateness 
of the problem of moments, that is, for the uniqueness of a function having a 
given set of moments. 

Let 


(29) yw, = M(v,0,0,--- ,0) + M(O,»,0,---,0) +--- + M(0,0,--- , 0, v). 


With this notation the conditions of Cramér and Wold can be stated as follows: 
If any two probability laws, e.g., the probability laws p(g:, --- ,¢.| w) and 
p(¢i, +--+ , ¢s), have all their moments and all their product moments identical 
and if the series 


(30) Xu ba” 


is divergent, then 
(31) D(gi, +++ ,¢s| W) = Digi, +++ , Ge) 


almost everywhere. 

Therefore, to know whether the family F(e) defined above exhausts all the 
regions similar to W, we must calculate the even moments of all the ¢; and see 
whether the series (30) depending on these moments is divergent. If it is, there 
is no similar region besides the family F(e). Otherwise, there may be some 
others. These others will be constructed by selecting w(¢y)’s such that the in- 
tegral (27) equals any other probability law having the same moments as (26). 
In such cases, a region w selected, in one way or another, from the family F(e) 
as the best from the point of view of controlling errors of the second kind will 
only be the relative best. 

It should be mentioned that whether we can always, under the conditions 
considered, select a w(y) on any W(¢) that satisfies the identity (28) has not 
yet been proved. However, it seems plausible that the differential equations (9) 
imply the existence of a sufficient set of statistics for 61, 62,---,6.. If this is 
so, the possibility of satisfying (28) is guaranteed (see [2], p. 366). 





rw Wwe Se _lcrChlC<t—M Slr ello OO 


A STATISTICAL PROBLEM 55 


4. Regions similar to the sample space with respect to co, $:, &,---, &y. 
We may now return to the original problem and apply our theory to the proba- 
bility law (6). We wish to construct the most general regions similar to the 
sample space with respect to the nuisance parameters a, &, --- , vy unspecified 
by the hypothesis tested. We let 


(32) on = 206P - Nm LY (oy 8)" 


(33) — = with x. 
; Cc 
Then 
ae 
dye 
df: 
dys 
0& 
oe on @, 


0€; 
and we see that the probability law (6) satisfies the differential equations (9). 
Now the hypersurfaces W(¢) of the theory are the intersections of the hyper- 
surfaces 


tj 


(35) ¢e = constant and 4; = constant, fori = 1,2,--- ,N. 
The latter equations are clearly equivalent to 

(36) Z;. = constant. 

As to the former, we notice the identity 


(37) > (as — 8) = nD (St + (es. — 89) = x’, (ay) 


n 


where nS? = >> (z;,; — 2;.)°.. Therefore, W(v) denotes the intersection of the 
j=l 


= 


hypersurfaces (36) with, say, 
N 
(38) T, = >. S? = constant. 


t=—1 
If we succeed in selecting from each hypersurface W(¢) a part w(¢) satisfying 
condition (28) identically then the sum of all such regions w(y) will form a 
region w similar to W with respect to all the unspecified parameters and belong- 
ing to the family F(e). Before proceeding to this stage of the solution, let us see 
whether the family F(¢) exhausts all of the similar regions. 





56 J. NEYMAN 


For this purpose notice first that instead of considering whether there is but 
one probability law with moments equal to those of ¢, and the ¢,’s, it is suffi- 
cient to concern ourselves with the moments of x’ and z;.. In fact, all the g’s 
are functions of these variables and the problem of uniqueness of the distribution 
must have the same answer in both cases. The 2rth absolute moment of x’ 
as calculated from (6) equals 


(39) (20°)"T(3Nn + 2v)/T(4N7). 
The same order moment of 2;,. is 

(40) a” (2v)!/(2n)’v!. 

Thus, the quantity denoted by 2, in the theory becomes 


(20°)” r(4Nn + 2y) (<) (2v)! 
41 a GU Lccannnigeellinienatnensans co ; 
(41) - T(4Nn) +a) F 
We are interested in whether or not the series (30) is divergent. Since pe, satis- 
fies the inequality 


(42) wo <a’ T(b + 2v) = Cz,”, = (say) 


with a = 20° + N and 2b = Nn, if we prove that the series C2, diverges, then 
(30). also diverges. To settle this conveniently we apply Stirling’s formula to 
T'(b + 2v) and find that, as vy > ~, the ratio C2,/v tends to a finite limit. As 
the series =v” is divergent, so is the series C2, and thus the series a" te 
divergent. Therefore, there is but one probability law with moments identical 
to those of x’ and the z;.’s and so the family F(e) contains all the regions similar 
to the sample space with respect to o, &,---, &w. 

It may now be interesting to go into some details of the effective construction 
of any region similar to W with respect to o, &,---,éy. For this purpose it 
is convenient to go back and express the identity (28), that the regions w(¢) 
must-satisfy, in terms of the relative probability law of 2.41 , 242, -++ , 2m given 
¢1,¢2,°++,¢s- This is denoted by p(Zs41, 2042, °°* » 2m|¢1,°°* » Gs) and de- 
fined for every system of values of the ¢’s for which p(gi , ge, --- ,¢s) # Oas 
follows: 


p(Ze41, 242, °° Zm | ¢1 2, °** 5 Ps) 
(43) 


= pyr, 2%) Ps, Zetly °°" y 2m) /p(¢r, Pe » Gr). 
Using (26), (27), and (43), the identity (28) can be rewritten in the following form 


(44) [ne LL  pleosay <2 5 aml ery +55 60) denas +++ dn = € 
w'(¢ 


The function under this integral is the relative elementary probability law 
Of 2.41, 242, °° , 2m and it is integrated over the region w’(y). Therefore, the 
left side of (44) is nothing but the relative probability of the point EZ’ falling in 








_ > — Bw Se 


— 


hm 


S 














A STATISTICAL PROBLEM 57 





w’ (¢) given that the first s of its courdinates have the fixed values g: , g2, --+ , ¢- 


In other words, and owing to the one-to-one correspondance between the spaces 
W and W’, we have 


(45) P(E’ ew'(y) | EF’ e W'(y)} = P{Eewly) |e Wy)} =e. 


Now the general method of determining similar regions may be stated as 
follows: 

1. Choose any system of variables 2.41, 242, °++ , 2m such that their values 
determine uniquely the position of the point E’ on any fixed hypersurface W’(¢). 
These z’s considered as functions of the y’s should be continuously differentiable 
twice. 

2. Find the relative probability law of the z’s given the y’s. This must be 
done for every possible set of values of the ¢’s. 

3. In the space of 2.41 , Ze42., +--+ , 2m Consider regions which satisfy the equality 
(44) identically in the 6’s. Any such region could be taken to form a part of w’, 
the region similar to the sample space, which we are trying to construct. If 
the assumption that the differential equations (9) imply the existence of a suffi- 
cient system of statistics for 0,, 62, --- , 6, is true, then (see [2], p. 366) the 
probability law p(2Z.41, 242, °°: ,2m|¢1, °° ,¢s) Will be independent of the 
6’s and there will be an infinity of regions satisfying (44). 

Obviously, instead of dealing directly with ¢: , go , --- , ¢ as described above, 
we may select any system of statistics T; , T:, --- , JT, such that the system of 
equations 7; = constant is equivalent to g; = constant, for 7 = 1, 2,---,8s. 

Returning to the particular problem of similar regions with respect to oa, 
&,---, &y, we notice that instead of the y’s we may consider 


N 

(46) T, = 2d Si and Ty = %. fort = 1,2, ---,N. 
Now we wish to select a convenient system of variables, denoted by z,+,’s in 
the theory above, to determine the position of the point E’ on any hypersurface 
W’(¢) where all the functions (46) have fixed values. Obviously there is no 
unique choice and we shall use what we find convenient. But notice that the 
total number of these variables should be, in our case, Nn — N — 1. The 
following system may be suggested. 

If the sum 2S} has a fixed value 7, then none of the S; can exceed T,. Write 


Si = uT 
(47) N=1 
sy = (1- Du)r, fori = 1,2,---,N—1 
t=1 
and consider wu, Ue, --+ , Uv—1 aS belonging to the system of variables sought. 


The region of their variation is determined by the inequalities 


N—1 


(48) O<u and du <1 
t=] 












58 J. NEYMAN 


If the u’s are fixed then they; together with the value of 7, , determine the 
values of S,;, S2,---, Sw. As the values of z;. = 74: are already fixed, we 
have to solve the problem of choosing for each i = 1, 2,--- , N a system of 
n — 2 variables, say 2:,1, 21,2, ++ , Zin-2, Which with z;. and S; will completely 
determine the values of 25,1 , Zi2, --+ , Zin. However, this will only have to be 
done if n > 2. Following the now familiar method (see, for example, [5], pp. 
33-43), we may determine the z;,; in two consecutive steps. First write 


Gia = 2%. + W/ og% + W/o foo + V aor U;,n—1 
1 
- 2s Lg t+ ret VT Vin—-1 
1 


where 0;,1, ¥i,2, °** ) Yi,n—1 ae New variables satisfying the identity 
n—1 n 

(50) De Mi = Dy (ij — 2.) 
i=l 7=1 

We transform them further by putting 


wn1 = /n 8; COS 2;,n—2 COS 2j,n-3 °° * COS 2;,2 COS 2,1 
3,2 = W/nS; COS Z,n-2 COS Zin-3 °° COS 42 SiN %1 
(51) v3 = VNS; COS Z;.n-2 COS Z.n-3 °° 


%n.—1 = Vn 8; SIN 2;,n-2 
with the z’s varying as follows 


0 < 2,1 < 2x 
(52) forj = 2,3,---,n — 2. 
—n/2 < 2%; < 2/2 
Of course, instead of the S; we should put their expressions in terms of 7 and 
the w’s into (51). With the exception of a set of measure zero, which can be 
ignored, the formulae above determine a one-to-one transformation of the 
original space W of the 2’s into the space W’ of 71, T2, --- , Twa, ui, <*> 
Un-1, ANd 251, 2:2, °°* , Z,n-2 for? = 1,2,---,N. 

In calculating the joint probability law of all the new variables, we notice 
that, on the hypothesis tested, all the Nn original. variables are mutually inde- 
pendent. Consequently, the transformations (49) and (51), which refer to 
separate groups of the 2;,;’s, corresponding to fixed values of 7, could be carried 


’ 





A STATISTICAL PROBLEM 59 


through separately. In doing so, we use formulae deduced elsewhere (see [5], 
pp. 38-39) directly and obtain 


(53) p(x. ’ S;, Zily ery Zi,n—2) _ (Ye vn 5) Sie inst eee Ul cos’ ?_ 
ov 2x j=2 


It follows that 


p(x., _— , ty., Si, ssid , Sw, 2, - , 2y,n—2) 


N 
= II p(x. Si, 2:1, +++ , Zina) 


t=] 


a/n Nn N ; ania N — N n—2 i 
-_ ( ) et” a, Fe Fa) o II eee II II cos? Sa 
oC 


Qn i=1 k=1 j=2 


We now wish to introduce 7; and the u; instead of the S,’s. Since all other 
variables remain unchanged the Jacobian of this transformation reduces to that 
of (47). Simple calculations show that 


7 N-1 \-4N-1 
(55) oe = 4)" rhe (4 — > x) I uz. 


a(T;, U1,***, un-1) i=l t=] 
Using this expression and substituting (47) in (54) we finally obtain 


p(x1., —s » UNn-, Ti, U1, rr Ss Un-1; 41,1, a? » Zn n—2) 


_ ay” (- a _)" gotn Blast 0 oe 
2 


(56) 


t=] k=1 j=2 


To obtain the relative probability law of uw, we,--+ ,Uw-1, 21,1, °°* , Zn,n—-2 
given 7 and the 7:4, = 2;., we must calculate p(7,, T2,---, T'w4:) and 
divide expression (56) by it. Of course, p(71, T:, --+ , T'v4:) is obtained from 
(56) by integrating over the whole of W’(y), that is, for all other variables be- 
tween the extreme limits of their variation. As these limits are independent of 
the values of T,, T2, --- , T41, the result will be 


(57) p(T, Ts, a Tas) - cont S, (24 ts 802 ges pee 


where c denotes a constant. Thus 


P(t, e+ » UN-1, 21,1, °°° » Zv,n—2 | Ti, ++: a 


(58) ‘eal ((1 _ = * it.) ii TT cos H., 


t=1 k=1 j=2 





60 J. NEYMAN 
with the region of variation W’(y) limited by the following inequalities 


N-1 
0 <u, Du <1 


t=1 
0 < Za < 20 fork = 1,2,.---,N, 
—a/2< eka < 4/2 j = 2,3,---,n— 2. 
Since (58) integrated over W’(y) is identically unity, c; is a purely numerical 
constant. 
Now to construct any region w similar to the sample space with respect to a, 
&,---,&y, we must select, separately for each and all systems (¢) of values 


of T7,, T2,--+- , Tw4i1, a region w’(¢), part of W’(y) as defined by (59), with the 
sole restriction that 


(59) 


(60) Fes / lua, s+, Unaay 21a e+ 4 @na| Tr, +++, Twas) 
w'(¢e 


ii, ++ Mia ee 


Obviously, there is an infinity of ways of selecting any single one of such 
regions. For example, we could let the w’s vary as indicated in (59) and limit 
the z’s by 


(61) 0< &1 Sa, —@S 2%; Sa (k = 1,2,---,N;j = 2,3,---,n — 2) 


where a is chosen so that (60) is satisfied. This choice of w’(y) may correspond 
to one particular system of values of 7; , T2,--- , T'y4: and no other. Again, 
the same region (61) may be chosen to serve for all systems of values of the T’s. 
In this case, the region w = )> w(y) might be described as cylindrical. Any 


e 

such region w will control errors of the first kind in testing H to the same level 
of significance ¢ and, as far as these errors alone are concerned, each of these 
regions is of equal value. Whatever the choice of regions w’(y) or w(¢), the 
test of H will consist of (1) observing the values of the 2;,;’s, (2) calculating the 
corresponding value of 7,, T2, --- , T41, the u’s, and the z’s, and (3) noting 
whether the point with coordinates wu, U2, +--+ , Unw-1, 21,1, °°* , 2v,n-2 falls in 
the region w(y) chosen to correspond to the observed values of 7,, T2,---, 
Twi. Of course, in practical cases, the choice of w’(y) for one system of values 
of the 7’s will not be quite unconnected with that for others. On the contrary, 
there will probably be some more or less simple rule connecting w’(y) with the 
corresponding systems of the 7’s. As a result, the actual machinery of the test 
will be much simpler than that described above and will consist of the calcula- 
tion of only a very few functions of the z’s and in checking some simple in- 
equalities. 

Now our purpose is to select a region from the infinite family F(e) of all 
regions similar to the sample space with respect to o, &, --- , £y which we judge 
most satisfactory for controlling errors of the second kind. Roughly speaking, 





A STATISTICAL PROBLEM 61 


this region will have to be such that, if the hypothesis H is not true, the observed 
point EF will fall in this particular region as frequently as possible, in general. 
Here we come to the necessity of specifying the ways in which we expect the 
hypothesis H to be untrue. It may be untrue in an infinite number of ways. 
For example, the values of the o’s may (1) be equally distributed over any given 
range, (2) may fall into just two groups o; = 1 and a; = 2, or (8) all o,’s except 
the last may have the same value o while the last is 10c, and so forth. Any 
such assumption will be called an hypothesis alternative to H. It is obvious 
that the probability of E falling in any given region w will be different for each 
of them. Therefore, if we wish to deduce a test which will detect the falsehood 
of the hypothesis tested frequently, we inust analyse the practical cases where 
the test is to be applied and guess the ways in which the hypothesis tested is 
usually wrong. Then we can deduce a test which will be, in one sense or 
another, most sensitive to the assumed deviations from the hypothesis tested. 
Needless to say, our guess may be right or wrong. In the latter case, an in- 
creased volume of observational material may demonstrate its fallacy and sug- 
gest the necessary modifications. In any case, it is important to know exactly 
the class of alternatives for which our test is, in some particular way, the best. 


5. The set of hypotheses alternative to H. Let us consider the routine analy- 
ses made at some laboratory and try to discover the circumstances likely to 
cause variation in their accuracy. First of all, we may think of assignable 
causes such as a change in personnel, apparatus, or accommodation. These 
and similar causes are likely to produce lasting effects; the test of the hypothesis 
that they did not reduces to one of the equality of only two o’s. An easy 
application of known theory [20] shows that the familiar F or z test is unbiased 
of type B,, which means that it is preferable to any other. Consequently, 
situations of this kind and also similar one for which the L; test is applicable [9], 
need not be considered here, so that we may concentrate on cases where there is 
no directly assignable cause of variation in the accuracy of the analyses. As- 
sume then that the personnel, the apparatus, the accommodation, etc., remain 
the same. Now the accuracy of analyses depends on a multitude of causes 
evading identification, such as changes in the efficiency of the workers. In 
principle, they try to have the highest, and therefore a constant, level of accuracy. 
Uncontrollable circumstances cause some fluctuations about a certain average 
and we expect that small deviations from this average will occur more frequently 
than large ones. With this in mind, the author feels that it would’ be appro- 
priate to expect that variations in accuracy, if any, will have a random character 
so that any o; referring to one particular group of analyses, or any monotonic 
function of that o; could be considered as an essentially positive random variable, 
having some unimodal probability law. To make the problem of the best test 
sufficiently specific, we must specify this law entirely. Here we face a some- 
what embarassing freedom of choice. For lack of more precise information as 
to the random variability’of o; , we guide ourselves by considerations of ease in 








62 J. NEYMAN 


calculations. From this point of view it is convenient to consider the variable 
(62) =o" 


and assume that, within a given period of time which is not too long, when the 
conditions in a laboratory are sensibly constant, it is varying according to the law 


(63) pth) = Bth**e™/T(a) for 0 <h, 


where a and 8 are unknown non-negative constants. It is useful to express these 
constants in terms of two new ones which have an obvious interpretation: ho , the 
expectation of h, and v, the square of the coefficient of variation of h. Easy 
calculations give 


(64) a = 1/y, B = 1/hov. 
Now p(h) has the form 


= 1 (1/v)—1 —h/hor 
(65) ph) = (ho) TU») h e ; 

We note that when v — 0 the probability law (65) tends to a limiting dis- 
continuous form with P{h = ho} = 1. This corresponds to the hypothesis H 
that we wish to test. The type of law represented by (65) is known to be 
rather flexible. Consequently, we may easily assume that even though the true 
variability of h (or c) does not exactly correspond to (65), there will be a system 
of values of hy and v for which the difference between the true law and (65) will 
not be large. Therefore, a test which is particularly sensitive to deviations of v 
from zero with law (65) will be reasonably sensitive in real practical cases. 
However, this is an assumption by the author. But it is subject to test and this 
will be done below. 

Formula (63) represents the hypothetical probability law of the variable h 
which is not directly observable. We must use this formula to obtain the 
probability law of the observable z’s alternative to (6), which corresponds to the 
hypothesis H being true. Using h = 1/o’, we write the relative probability law 
Of 21, Li2,°++ , Lin given h 


n/2 n 
(66) P(t, +++, Zin|h) = (+) et, esi -8)? 
2r 


Multiplying (66) by (65) we obtain the joint probability law of h and the 2;,;’s 
referring to one group of analyses 


1 ee i Was eal 
wn lie aes priztcln—t or Uhortt > (2i,j-85) : 
(67) pth, Ti, » ti, ) (2n)"2(hov)'” T(1/v) T(1/) é ( i=1 ) 
Integrating (67) with respect to h from zero to infinity, we obtain the absolute 
probability law of 2:1, %i2, +--+ , Zin, all referring to the 7th group of analyses. 
Assuming that the value of h in one group of analyses is independent of that in 
another, we obtain the joint probability law of all the Nn observable 2z;,;’s by 





A STATISTICAL PROBLEM 63 


simply multiplying the probability laws referring to particular groups of n of 
them. The result will depend on N + 2 unknown parameters, £, &,---, 
ty, o, andy. As the last two will play a more important role than the others 
we shall denote the probability law by p(E | ho, v). Easy calculations give 


N 4Nn 
(68) p(E|ho,») = (Feet ue) so en: 
II (: + > Zz (x;,;°— 2") 


t=1 j=1 


We easily check that for vy — 0 (68) approaches the law (6) with ho = o. 
Therefore, the problem that we shall treat below will be to assume that the 
observable zx’s follow (68) with some ho > 0 and some v > O and to test the 
hypothesis H that vy = 0. More specifically, we shall try to choose among all 
the regions of the family F(e), found in the preceding section the one over which 
the integral of the function (68) is, in general, the largest. 

Before doing so, it may be useful to exhibit some experimental evidence in 
favor of the assumption that, if o is not constant in some conditions of analysis 
or measurement, then it varies in such a way that the variability of the z’s has 
at least some characteristics appropriate to (68). 

Introduce the notation 


n 


(69) w, = nSi = Z (x3 — 2%.)°. 


= 


Using transformations (49), (50), and (69), successively, we easily deduce the 


probability law of w,; 


o (hov/2)*-? r(4(n — 1) + 1/r) wt? 
(70) pwi) = FG = D)FG/»)  ¥ Fhova hD” 


If the hypothesis we have made about the variability of h, as expressed by (65), 
is true in any particular case then the sumg of squares (69), referring to each 
particular group of analyses, are distributed according to (70). The reverse is 
not necessarily true, of course, but it is comforting that a check of the above 
in a number of broadly divergent circumstances gives satisfactory results. By 
applying the transformation 1 + hovw;/2 = t”’, the integral of (70) is easily 
reduced to an Incomplete Beta function whence Pearson’s tables [24] provide 
an easy means of calculating the theoretical probability that w; is within any 
given limits. 

Table I gives several observed distributions of the sums w together with their 
expected ones, calculated from (70) with the values of ho and » fitted by the 
method of moments. The last lines give particulars of the application of the 
x’ test for goodness of fit. 

The origin of the data used to compile Table I is as follows: 

For the data providing frequency distributions numbered 1 and 2, the author 
is deeply indebted to Professor Raymond T. Birge. The methods of measure- 
ment and their purpose are explained in the publications [25] and [26], respec- 





Number 


Author or 
Source of 
Data 


Kind of Mea- 
surement or 
Analysis 


11-12 
12-13 
13-14 
14-15 
15-16 
16-17 
17-18 
18-19 
19-20 
20-21 
21-22 
22-23 
23-24 
24-25 
25-26 
26-32 
32-43 

>43 


Total 


Degrees of 
Freedom 


P(x’) 


R. T. Birge 


Strong Lines 


in the Band 
Spectra of 
Nitrogen 


Frequency 
Exp. | Obs. 
29-38] 29 
19-30} 20 
13-11) 17 
9-16 
6-56 
4-80 
3-59 


4.80 


3-94 


5-36 


wWwoCcococoorFrKOCOrFOCOF KR OOCONWHKE RK Os 


100 -00|100 


7 
-21 


TABLE I 
Comparison of empirical distributions of w with those calculated from (70) 


R. T. Birge 


A Solar 
Spectrum 
Line 


Frequency 
Exp. | Obs. 
15-10 
13-14 
11-39 
9.84 
8-46 
7-24 
6-17 
5-23 
4.40 
3-69 


5-63 
3-76 


5-95 


et CO he CO DD DDD RH OO OI 





K. Buszezyh- 
ski and Sons, 
Ltd. 


Sugar Content 
of Beets 


Frequency 
Exp. | Obs. 
15-56 
12-67 
10-70 
8-98 
7-53 
6-34 
5-36 
4.54 
3-86 


6-09 


16 
17 
13 

2 


— 
—_ 


4.45 


4.61 


She OORM CK OF WOOF KOON ORN WS 


4 


A. A. Michel- | 


son, F. G. 
Pease, and 
F. Pearson 


Velocity of | 


Light 


Frequency 
Obs. 


Exp. 
3-50) 2 
7-73} 10 


4.23 


3-94 
3-58 
3-61 





100 -00/100 


12-67 


10 
24 


18-75 


11 
-066 


18-09 


18 
-45 








5 


W. S. Svenson 


Octane 
Rating 


Frequency 
Exp. | Obs. 
14-90} 17 
18-88) 16 
16-83} 14 
13-93} 12 
11-20} 10 
8-91 
7-04 
5-58 
4.43 
3-52 


5-08 


—_ 
ee OOCCCCON KKK Or Or wnnoon 


120 -99}121 
13-35 


10 
-21 


The symbols } are used to indicate the groupings used in the calculation of 
the x?._ The groupings were made so as to have the expected frequency in a 
class at least equal to 3.5. 


64 





A STATISTICAL PROBLEM 65 


tively. These papers also contain various compilations of the results of the 
measurements. However, the original single measurements, necessary for the 
present paper, are naturally unpublished and Professor Birge was kind enough 
to find them for the author in his records. 

Frequency distribution No. 3 was compiled from a book of records of sugar 
beet trials carried out by Messrs. K. Buszezyfski and Sons, Ltd. in Gérka 
Narodowa, Poland. 

The 4th distribution was constructed from the original measurements of the 
velocity of light as published [27] by Michelson, Pease, and Pearson. The 
measurements made during single days were treated as forming separate groups. 

Distribution No. 5 originated from repeated measurements of Octane Rating 
conducted by a refining company in California. They were made accessible by 
Mr. Walter S. Svenson and it is a pleasure to express the author’s deep grati- 
tude to him. 

The number of observations in each column is not very large. It may be 
expected that if it were increased, the differences between the hypothetical 
distributions and the observed ones would become more apparent. It seems 
safe, however, to assume that in a number of instances the hypothesis as to the 
character of the variability of w; is not in very bad disagreement with the actual 
facts. It would be most interesting to have some more data on the subject. 


6. The best critical region for testing H against a particular alternative. It 
seems unquestionable that the most desirable test of any hypothesis is the uni- 


formly most powerful test (U. M. P. Test) with respect to the whole class of 
simple hypotheses alternative to the one which is being tested. Denote by H 
the hypothesis tested, by h any simple admissible hypothesis alternative to H, 
and by © the set of all h’s. If w is the critical region corresponding to the 
U. M. P. Test, then wp has these properties: 


(71) (1) P{Eew|H} =e. 

(2) If w is any other region such that P{Eew|H} = e then 
(72) P{E ew |h} > PiEew|h}, 
whatever be h € Q. 

Following the known method [18], we shall see whether a test of the hypothesis 
H considered in the preceding sections exists which is a U. M. P. Test with re- 
spect to the whole class of admissible hypotheses that specify the probability 
laws (68) with any ho > Oand »v > 0. 

The method consists of considering one particular alternative hypothesis h’, 
that is, one particular set of values of ho > 0 and v > O and finding the best 
critical region w,,,, for testing H against h’. If this region appears to depend 
on v and/or on hy then there is no U. M. P. Test. The region w,,,, is found by 


determining, for each system (¢) of 7; , Tz, --- , Tw41 separately, a part w,,»(¢) 
determined by the inequality , 


(73) P(E | ho, v) = k(y)p(E | H) 





66 J. NEYMAN 


where k(¢) is a function of 7, , T2, --- , Tw4: so determined that the relation 
(60) is satisfied. Substituting (6) and (68) in (73), taking the logarithm of both 
sides, and combining all terms which are constant or depend only on 7; , T2, --+ , 
T ws, we have 


(74) a log (1 + 4hovn(Si + (Tix: — &))) < Wi(Ti, ---, Trax), (Say). 


Clearly, for 7; , Tz, --- , T 4: fixed, this inequality imposes a restriction on the 
variability of wu, U2, --+ , Uv41 While 2,1, --+ , Zv,n-2 are allowed to vary indis- 
criminately within the extreme limits (52). But the region w,,,,(¢) determined 
by (74) also depends on the product hov. Therefore, there is no uniformly most 
powerful test for testing H against any and all simple alternatives specifying (68). 


7. Acritical region of an unbiased type. There seems to be no grounds for 
dissention that when a U. M. P. Test exists and is readily applicable, it is pref- 
erable to any other test, but the situation is quite different when there is no 
U. M. P. Test. In such cases, practical considerations may suggest a variety 
of requirements for a second best test of the hypothesis. Among these, we may 
suggest the following considerations: 

Fix, for a moment, the values of ho, &,---, wv, take any region w of the 
family F(e), and consider the probability of EF falling in w as a function of » 
only. This is called the power function 


(75) a»|w) = | --- | ple \ho,») dria ++ dew, 


Here, of course, vy > 0. Because of the properties of regions belonging to F(e) 
we have B(0|w) = «. If vy > 0, the value of 8(» | w) represents the corre- 
sponding probability of the test (based on w) discovering the falsehood of H. 
It is obviously desirable to have this probability as large as possible. In any 
case, it should be greater than «. This last restriction is known as that of un- 
biasedness [19], [20], [28]. Further, since it is impossible to maximize A(v | w) 
for all values of v, we must choose those for which it is most desirable, in our 
opinion, to concentrate our efforts to increase B(v|w). One possible point of 
view is that these values should be very close to the hypothetical value v = 0. 
For if v is considerably larger than zero, we may argue that there will be no 
need to apply any refined statistical test to detect the falsehood of H. Of 
course, this argument has no mathematical character and its general acceptance 
is not suggested. In fact, we may argue that if v is greater than zero but very 
small, it will be almost impossible to detect the falsehood of H by any test and, 
therefore, our efforts should be concentrated on values of v which are of con- 
siderable size. 

These are considerations of non-mathematical character; the role of mathe- 
matical statistics is limited to devising tests and elucidating their properties. 
If these last are understood by practical statisticians, each may choose according 





A STATISTICAL PROBLEM 67 


to his problem. Note that what could be termed the “properties” of a test are 
summarized in the power function 8(v | w) with its relation to the power func- 
tions of other possible tests of the same hypothesis. 

In this paper we shall deal with tests particularly sensitive to small deviations 
of v from its hypothetical value vy = 0. In this respect, our first trial is to find 
a region Wo , belonging to the family F(e) and satisfying the condition 


where w is any other region belonging to the same family F(e). 

Because of the peculiar structure of the regions belonging to F(e), the problem 
is immediately reduced to finding regions w(¢g). According to theory explained 
elsewhere [18] these should satisfy the condition 


(77) PE lbe®)| | > nme H, 


where k(7’) depends on T,, T2, --- , 7'v4: only and is determined to satisfy the 
condition of similarity (60). Condition (77) is equivalent to 


(78) 2 tos pho) | > k(T). 


Taking the logarithm of (68), differentiating with respect to v, putting v equal 
to zero, substituting in (78), and combining all the terms which are constant 
on W(¢) into a single term which we may write as $h$k,(7’), we have 


(79) 2» (Si + (Tia — &)*)? > ki(T). 


We note that condition (79) determining, so to speak, the shape of the region 
wo(¢y) does not imply any restriction on the variability of the z’s but only on 
the u’s. However, the region wo(y) as determined by (79) has the disadvantage 
of being dependent on the values of the &;. Since these are not specified by 
the hypothesis tested, we are not able to determine the critical regions belonging 
to the family F(e) and maximizing the derivative 08(v | w)/dv],.0. The region 
which does so for some particular system ti, t,---, &y of values of the é’s 
will lose this property if the system of values of the ’s is appropriately changed. 
Therefore, our choice of the region maximizing the derivative of the power func- 
tion at v = 0 should be made not from the whole family F(¢) but from a sub- 
family F;(e) composed only of such regions which also possess the supplementary 
property that 


we s6t0|) 


| = constant 
veut) 


has a value independent of &, &,---,é¢w. The determination of this sub- 
family F,(¢) embracing all such regions is an interesting problem. Until it is 





68 J. NEYMAN 


solved, we use an obvious subfamily F2(e) of regions w which have the desired 
property, but we do not know whether or not F;(¢) contains all such regions.” 

The family F2(e) is defined as consisting of those regions belonging to F(e) 
which could be described as cylindrical with their generators parallel to the 
intersection of Tis; = x,. = constant, for 7 = 1, 2,---,N. In other words 
and more precisely, a region w of the family F(e) belongs to F2(e) only if the 
question of its including a given point E depends on Nn — N of its coordinates, 
namely on 71, u1,--- , Uw-1, 21,1, °** , 2v,n-2 and not on T,, 73, --- , Twas. 

We easily show that any region w belonging to F2(e€) possesses the property 
that its power function is independent of the &,’s. Denote by w’ the set of sys- 
tems of values of 71, ui, +--+ ,Unw-1, 21,1, °°* » 2v,n-2 COrresponding to points 
included in any given region w of the family F2.(e). We see that the power 
function 8(v | w), equal to the integral of (68) over w, can be calculated by using 
the transformations (47), (49), and (51). Then the region of integration for 
T1, U1, +++, Un-1, 21,1, °** , 2y,n-2 IS What we have just denoted by w’ and the 
integrations for 734, = 2z;. extend from — «x to + irrespective of the fixed 
values of the other variables. These integrations are easily carried out by sub- 
stituting 


(81) Anhov(xi. — &:)° = (1 + AnhorSi)tj . 
The final result is 


(82) Biv|w) = / vee [ pn, Ur, +++ Uni, 21,1, °° Zn,n—2) AT: +++ dey n—2 
Here 


pT, U1, °°° Un—-1, 21,1, °°° Zn ,n-2) 


(83) . ‘ 
= e(v)®(T1, u, 2) /T (1 + Saher?) OH", 
i=1 


where c(v) denotes a constant depending on v, &(T, , u, z) denotes a function of 
all the N(m — 1) variables involved, independent of v, and S; denotes expressions 
(47) for short. We see that (82) is independent of the é,’s. 

Since the region w belongs to F(e), it is composed of sections w(¢g) selected 
separately on each hypersurface 7; = constant and 74; = constant, 7 = 1, 
2,---,N. Because of the definition of the family F2(e), the sections w(¢) are 
independent of T:, 73, --- , T4: so that each of them can be selected only in 
accordance with the value of 7,. Therefore, we may denote them by w(T7)). 
As far as property (80) is concerned, the choice is arbitrary. But the property 
of similarity requires the fulfillment of condition (60) which, in the present case, 
reduces to 


(84) / wee / p(ur, +++ Unt, 21,1, °° Zwn-2| U1, +++ Two) dui -++ dzyn2 =€ 
w(T 1) 


2 Regions with the property (80) and belonging to F(e) but not to F2(e) exist. Probably 
however, each of them differs from one of the regions of F2(e) by a set of measure zero only. 





A STATISTICAL PROBLEM 69 


Applying the method already used, we find that sections @(7;) of the region ® 
belonging to F2(e) and maximizing the derivative 08(v | w)/dv],.o are determined, 
separately for each value of 7; , by the inequality 


(85) d log p(Ti, wu, teat sain 


fo) | = belt) 


where k2.(7;) denotes a function of 7; determined to satisfy (84). 
Substituting (83) in (85) we easily find that this condition is equivalent to 


N-—1 N—1 2 
(86) f= » ui + (1 -> w) > ks(T1) 
where, again, k3(7';) is determined for each particular value of 7 to satisfy (84). 
As (86) does not imply any restrictions on the variability of 21,1 , 21,2, «++ , Zv,n-2,; 
the integrations for the z’s while calculating (84) must be carried out over the 
extreme limits (52). This will reduce the integrand to the relative probability 


law of uy, U2, +--+ , Uw-1 given all the 7’s. This law is easily calculated from 
(58) and is 


pu, U2, eee Un | Ti, T2, eee T y+1) 
is T(4AN(n a 1)) (( an N-1 it u) 
” ramp Wt~ &™) 
— pu, U2; eid sci 
As (87) is independent of T; , T2, --- , T41, it is also the absolute probability 
law of the u’s and hence k3(71) is independent of 7,. In accordance with the 
notation adopted for the left side of (86), namely ¢, and since the choice of 


k3(T1) depends on e, n, and N, we may use ¢, instead of k3(7,)._ Then the region 
® is determined by the inequality 


N-—1 > 
(88) r= Dds (- = tu) 2h 
or, returning to the original variables, by the inequality 


(89) r= > st / > st) >t 


i=1 i=1 
where ¢, is the root of the equation 


T(3N(n — 1)) w—l =1  \d(n-3) - 
Paap) fo “L(G ~ Zu) Mw) da tna = 


i=1 t=1 


This region @ has the ition property: of all the regions belonging to the 
family F2(e), the derivative of the power function of @ at the point » = 0 is the 
greatest. Thus, as far as the values of » close to zero are concerned, we may 
say that, for testing H, @ is the most powerful critical region in the family F2(e). 





70 J. NEYMAN 


8. Methods of determining ¢.. To calculate ¢, accurately we must calculate 
the integral probability law of ¢, that is to say, 


(91) Pts <2) =] --- [ plu, +++ ua) diy +++ duy-s 
f 


<z 


for any z. The author was not able to achieve this. Therefore some methods 
of approximation had to be looked for. This task becomes somewhat simplified 
by noting that in most practical problems N will be very large, in the hundreds 
or thousands, while n will probably not exceed 5. 

To start, we notice that the range of ¢ is limited by 


(92) WN S581. 

The easiest way to see this is to look for maxima and minima of the sum 
1 

(93) X= 8; 
i=1 


subject to the restriction that 
(94) 


We then easily find that 
(95) Ti/N<X<Ti 


and (92) follows directly. 
Since ¢ is a polynomial of the second order in the u’s, we may consider its 
: 


moments. These will be functions of the expectations of the products [J ui‘ 
t=1 
N-1 


where, for short, uw = 1 — >> u;. Using (87) we easily find that 
i=1 


™ IN(n — 1) + Dk 


t=1 


~o we)-_2e-> 9-5) 
r( 


. 1(3( — 1)) 


In particular, if we let (n — 1)/2 = a 


(97) E(u2) = eee 


ta a(a + 1)(a + 2)(a + 3) 
(98) E(u:) = Na(wa + 1)(Na + 2)(Na +3) 


2.2) _ a (a + 1)’ 
(99) E(usui) = Naa + 1)(Na + 2)(Na +3)" 





A STATISTICAL PROBLEM 


Consequently and because ¢ = Zz ui , we have 
t=1 
(100) E(s) = m1 = (@+ 1)/(Na + 1) 
N N—1 N 
E(*) = wa = 2 Bui) +22 De Eluiui) 


t=1 j=i+1 
101 
ey t+ GFR 4 = Dale tay 
(Na + 1)(Na + 2)(Na+3) -(Na+ 1)(Na+ 2)(Na + 3)° 


The variance of of ¢ is therefore. 


_ 2Qa(a + 1)(N — 1) 
~ (Na + 1)(Na + 2)(Na + 3)" 


By a similar procedure we find that 


(a+ 1I(a+ 2)(a+ 3)(a+ 4)(a4+ 5) 
+ 3(N — 1)a(a + 1)*(a + 2)(a + 3) 
a + (N — 1)(N — 2)a°*(a + 1) 
(103) BG) = 8 = “Wa + ya + 2a + 8)(Na + 4)(Na + 5) 


(102) of 


I] @+5) + 4(N — 1)a(a+ ) I @+s) 


+ 3(N — 1a IT (a+ jy” 


+ 6(N — 1)(N — 2)a°(a + 1)°(a + 2)(a + 3) 
' N — 1)(N — 2)(N — 3)a’ 1)‘ 
a a + (N = 1)(N ~ 2N = 3)a'(a +) 
I] (Na + j) 
= 
One possible method of approximating £, is to use the formulae above, together 
with the higher moments whose formulae are easy to deduce. Some convenient 
known distribution, say po({), could be fitted to have its first two or three mo- 
ments coincide with those of the unknown true distribution of ¢. We would 
then look for better approximations by means of the functions 


(105) pa(t) = pol) : damn 


where the 7,’s denote polynomials which are orthogonal and normal with respect 
to po(¢) so that 


1 ifj=k 


UjT dq = 
(106) | x po(S) df if j x k. 


The constant coefficients A; are formed to minimize the integral 
m 2 

(107) J (ete) — ewe 3 Ave) vo" a. 
fon 


They are expressible in terms of the known moments of p(‘). 





72 J. NEYMAN 


This is one possible way to approximate p(¢) which would eventually lead to 
the computation of ¢, even for small values of N. 

Remembering that we are concerned with large N’s, we can prove that the 
normalized distribution of ¢, that is, the distribution of 


(108) fm 
Or i 


tends to be normal as N — «. However, the process of tending to the limit 
is rather slow as may be seen from the following table of K. Pearson’s 6; and 62 . 


TABLE II 
Frequency constants of the distribution of ¢ 








n N Mi oe Bi Be 


3 100 .0198 .001922 .8652 5.042 
3 200 .0099 .000693 .4618 4.244 
3 400 -0050 .000248 .2410 3.587 





Because of this and also because the proof that the distribution of (108) tends 
to normality is not very straightforward, we shall not reproduce it. But it may 
be well to point out that the cause of this slowness in tending to the limit lies 
in the skewness of the distribution of each particular u; and in the mutual 
dependency of all the w,’s. 

The most promising method seems to be the following. First consider the 
two sums 


N N 

(109) T= XSi and Ty = p> Si. 

Obviously, these two sums satisfy the conditions of the limiting theorem of 
S. Bernstein [29], [30] and, therefore, as N — ~, their joint normalized distri- 
bution tends to a normal surface. Also, we may expect the process of tending 
to the limit to be rapid in this case. If p(7>, 71) denotes the limiting normal 
distribution, the probability that ¢ > z can be approximately calculated by the 
integral 


+00 i] 
(10) Pig>eh= Pir >ej =f ar] p(t, TaN. 


To calculate the limiting distribution p(7> , 71) we need only the expectations, 
say A and B, of 7; and T> respectively, their standard errors, say o; and o2, 
_and their correlation coefficient R. These may be obtained from the moments 
of the S;’s. 

Formula (110) can be used not only for tabulating the integral probability 
law of ¢ and for determining ¢, , but also for an approximate calculation of the 
power function of the test. For, if the limiting probability law p(To, 7;) is 





A STATISTICAL PROBLEM 73 


calculated using the moments of S; calculated from (70) with some v > 0, then 
the integral (110) calculated with z = ¢, gives us the probability P{¢ > ¢. | v} 
of the test detecting the falsehood of the hypothesis tested, that is, the power 
function. 


To save space, we shall now calculate the constants A, B, o1, o2, and R as 
functions of vy > 0. The values appropriate to the case when the hypothesis: 
tested is true will then be obtained from the general formulae by the mere 
substitution of v = 0. 


Since all the constants above depend on the expectations of Sj", we use formula 
(70) to calculate them. Denoting the expectation of Sj‘ by us, we have 
“ 2(nho ar « gat 
" BU/», Xm — 1) do (1+ nhs” 
Introducing the new variable 
(112) 1 + 4nhwS’ = t* 
makes the integration straightforward and gives 
2 \'r(/) — HTG™m — 1) + 4%) 
(113) we = (| —— ) —— ee, 
nhov T(1/r»)T3(n -_ 1)) 
This formula holds good if 1/vy > k. Otherwise the kth moment y, is divergent. 
So this approximate method of calculating the power function of the test is 
applicable only for » < .25. 
Substituting k = 1, 2, 3, 4 in (113), we have 
= + *-! 
” nho l-—vp 


-(3,) n—1 

~ \nhoJ (1 — »)(1 — 2v) 
-(5) (n’ — 1)(n + 3) 
~ Anko] (1 — v)(1 — 2x)(1 — 37) 





(111) ds. 


1 


_ (5 ) (n® — 1)(n + 8)(n +5) 
Me Nano] (1 — 0 — 20) — 3x)(1 — 40)’ 


and now we have 


Nn-1, B= N n —1 

nho 1 — v’ ~ (nho)? (1 — v)(1 — 2r)’ 
_ N (n-—1(2+(n - 3)) 

“m 4" GS i-mi-s 

(7) a= QN (n> — 1)(2 + o(n — 3))(2(n + 2) — v(5n + 7)) 

> (nho) (1 — »)*(1 — 2v)2(1 — 3y)(1 — 4p) , 

_ n+ 1)(1 — 2v)(1 — 4p) 

“mn * + ga+8-aes I -) 


(115) A= 











74 J. NEYMAN 






Inspecting formulae (115) to (118) makes us see that there is an advantage 
in substituting two new variables 


_ mh _ (nh) , 
(119) 4h = N(n — 1) T1, te = N(n? — 1) To, 


for T, and T). Their expectations, say 3; and #: , are 


















1 1 
(120) vy = 1 a »? Be — (i — »)(i — 2v)° 
Probably without any danger of confusion, the S.E.’s of t; and f& may be de- 
noted by o; and oz also and we shall have 


es 2 + v(n a 3) 
“1 Nin — 1) — v0 — 20)’ 
gt = 22 + v(m — 3))(2(n + 2) — vn + 7) 
, N(n? — 1)(1 — 2)°(1 — 3r)(1 — 4») * 


Of course, the correlation coefficient of ¢, and t, is the same as that of T; and T>, 
namely R. Obviously, the inequality 7) > 27} is equivalent to > ait; pro- 
vided that 


(121) 


= n+1 
(122) ey 


Now the problem of calculating (110) is reduced to finding 
P{t >2z} = P{t > ati} 


gearing __1  Ja- 9) 
(123) 2ro102+/1 — R? i exp | (ai— =| a} 
(th — th)( — tb) , (eh — th)” 
-ap Saeeee+e } Jaw dh. 


We may conveniently see the workings of the test proposed by considering for- 
mula (123). First consider the case when the hypothesis tested is true. Both 
3, and 3 reduce to unity. The region of highest frequency is around the point 
t = & = 1. If N is large then both oa; and o2 are small so that the region of 
significant frequency is rather small. The integral (123) is to be taken over 
the region above the parabola f = z,t} passing through the origin of coordinates. 
When 2 is small and the parabola passes far below the point t; = % = 1, the 
probability P{¢ > z} will be close to unity. When z, = 1 this probability will 
be less than 4 and it will diminish rapidly with further increases of z;. Now 
suppose that we have found the value ¢, for which P{¢t > ¢.|v» = 0} = e and 
consider what will happen to (123) when z = ¢, if visincreased. Clearly, neither 
of o,; and o2 nor F& are very sensitive to slight changes in ». Also 6; will not 
change very much. On the other hand, #2 will increase rather fast. The final 




















A STATISTICAL PROBLEM 75 


conclusion is that the whole frequency surface. corresponding to the integrand in 
(123) will not change shape much but will ons to bring a greater amount of 
frequency into the region of integration. 
To facilitate numerical calculations introduce 
(124) oa i 2. _ te — 32 — Rorl(t — D1)/o1 
1 o2 /1 — R? 1 — R? 


Now (123) may be rewritten as 


1 a » 2 
125 P{¢ =z} = —  ocimmts tw? d a 
( ) is = 2) V 2 [. ‘ / Qe i , y . 


where 


(3; + a,x)” — th — Roox 
(126) (z, zZ ) = —-" mas . 
7 . ovV/1 — 


Using formulae (125), (126) and (119) to (122), the following numerical 
values were obtained. 


TABLE III 
n=3, N=100, v=0. 


P{r>z|» =0} 


.9126 
.7305 
-4905 
. 2847 
1495 
.0730 
.0335 
.0148 
.00644 
.00288 


8 

9 
1.0 
i 
1.2 
1.3 
1.4 
1.5 
1.6 
1.7 


34450 | .05000 
54563 .01000 


TABLE IV 
Power of the test forn = 3 and N = 100 


— 


€ Se v= .0l y = .16 


.05 .02689 .05823 .37482 
01 .03091 .01234 . 10699 


The figures above are only approximate and we realize that the greater the 
value of v the less satisfactory is the approximation of the power function. A 
check of the goodness of the approximation and, if it proves satisfactory, a few 





76 . J. NEYMAN 


numerical tables for practical applications of the test must be postponed to 
another publication. 

It is a pleasure to record ‘the author’s indebtedness to Miss Elizabeth Scott 
and also to Miss Julia Bowman for carrying out all the numerical work con- 
nected with the present paper. 


REFERENCES 


[1] J. Neyman, Jour. Roy. Stat. Soc., Vol. 97 (1934), p. 558. 

[2] J. Neyman, Phil. Trans. Roy. Soc. London, Vol. 236-A (1937), p. 333. 

[3] ‘‘Student’’, Biometrika, Vol. 19, (1927), p. 151. 

[4] J. Przyborowski, Roczniki Nauk Rolniczych i Le&nych, Vol. 30, (1933), p. 1. 

[5] J. Neyman, Lectures and conferences on mathematical statistics, Washington, D. C. 
(1937). 

[6] J. Berkson, Jour. Am. Stat. Assn., Vol. 33, (1938), p. 526. 

[7] J. Berkson, Jour. Am. Stat. Assn., Vol. 35, (1940), p. 362. 

[8] J. Neyman, Annals of Math. Stat., Vol. 11 (1940), p. 478. 

[9] J. Neyman and E. S. Pearson, Bull. Int. Acad. Polon. Sci. Cracovie, (A) Vol. 6 (1931), 
p. 460. 

[10] B. L. Welch, Biometrika, Vol. 27 (1935), p. 145. 

[11] B. L. Welch, Stat. Res. Memoirs, Vol. 1 (1936), p. 52. 

{12] U. S. Nair, Stat. Res. Memoirs, Vol. 1 (1936), p. 38. 

[13] M.S. Bartlett, Proc. Roy. Soc., Vol. 160-A (1937), p. 268. 

[14] S. S. Wilks and C. M. Thompson, Biometrika, Vol. 29 (i937), p. 124. 

[15] D. T. Bishop and U. S. Nair, Jour. Roy. Stat. Soc. Suppl., Vol. 6 (1939), p. 89. 

[16] E. J. G. Pitman, Biometrika, Vol. 31, (1939), p. 200. 

{17] H. O. Hartley, Biometrika, 31 (1940), p. 249. 

[18] J. Neyman and E. S. Pearson, Phil. Trans. Roy Soc. London, 231-A (1933), p. 289. 

[19] J. Neyman and E. 8. Pearson, Stat. Res. Memoirs, Vol. 1 (1936), p. 1. 

[20] J. Neyman, Bull. Soc. Math. France, Vol. 63 (1935), p. 246. 

(21] H. Hotelling, Ann. Math. Stat., Vol. 11 (1940), p. 271. 

[22] W. Feller, Stat. Res. Memoirs, Vol. 2 (1938), p. 117. 

[23] H. Cramér and H. Wold, Jour. London Math. Soc., Vol. 11 (1936), p. 291. 

[24] K. Pearson, Tables of the Incomplete Beta Function, Biometrika Office, University 
College, London, 1934. 

[25] R. T. Birge, Astrophysical Journal, Vol. 39 (1914), p. 50. 

[26] R. T: Birge, Physical Review, Vol. 40 (1932), p. 207. 

[27] A. A. Michelson, F. G. Pease, and F. Pearson, Astrophysical Journal, Vol. 82 (1935), 
Dp. 20. ‘ 

[28] J. Neyman and E. 8. Pearson, Stat. Res. Memoirs, Vol. 2 (1938), p. 25. 

[29] S. Bernstein, Math. Ann., Vol. 97 (1926), p. 44. 

[30] W. Kozakiewicz, Ann. Soc. Polon. Math., Vol. 13 (1934), p. 24. 





A CONCISE ANALYSIS OF CERTAIN ALGEBRAIC FORMS 


By FRANKLIN E. SATTERTHWAITE 


State University of Iowa, Iowa City, Iowa 


Many of the statistics in common use are functions of homogeneous algebraic 
forms in the items of the sample. Among such statistics are the mean, a linear 
form; the variance, a quadratic form; and the product moment, a bilinear form. 
With the extension of the science, the mathematical statistician is faced with 
the study of more complex statistics and the associated algebraic forms and 
matrices. The purpose of this paper is to set forth concise and efficient nota- 
tions and methods which may be used in such analysis. 

We shall borrow the essential features of our notation from differential geom- 
etry and tensor analysis. The Kroneker delta is defined as, 


& = 1, i =j, 


=0, ij. 


The summation convention provides that summation be performed with respect 
to any index appearing twice in the same term. Thus, 


ay’ = ry + my? +---. 
To extend the use of the summation convention, we shall frequently place 
indices on the numeral, 1. Thus, 
1‘s, = 1's, + I°xy +... oe & + ae + ss. 


Symmetry in the calculations is more striking if the pair of summation indices 
appears, one as a superscript, the other as a subscript. Therefore we allow the 
shifting of an index from the one position to the other at will. Thus, 


+ 


Y= 7. 


Where no confusion will arise, indices may be placed outside of parentheses. 


sa 7 2 (2)F) 
(«0 a sme 


The standard notations for averages will be used. 


1\’ 1 
(1) a: = @ Ci = (7) 2X4; 
: 1 


(2) 





78 FRANKLIN E. SATTERTHWAITE 


Unless otherwise indicated, the symbol, 2, will always stand for summation 


over all unrepeated indices including any already averaged under conventions 
(1) and (2). Thus, 


ze = Ne. 
The following simple formulas are fundamental to the arithmetic of this 


notation. They are obvious upon the expansion of the summations. Each 
index varies from 1 to a. These formulas are 


i 

652; = z;, 
J ok ke 

6; 6; = é;, 
i4k k 

6; 1; _ | 


131} = ali, 


5; = a, 


s ™~) a i 73 
1\’ 
= (6-2) 2. 


The symbols of this notation obey the associative, commutative, and the 
distributive laws of simple arithmetic so that the operations of summation, 
multiplication, and squaring are very easy. Thus for the product of two linear 


forms we have 
mas — 1 ; o 1 ’ ° — 1 ; a 
ty (2) “ (7) (3) —_ 


The sum of squares is obtained by the simple repetition of the form, 
Lai = U(si2;)” = (8) 2,)(8i2,), 


= (8% 2;) (si2*) = of a;2". 


(3) 


Two other sums of squares occur so frequently that they should be particularly 
noted: 


7 2 7 7 
Le =2 : zj = *) wv; (2) x, 
asi asi a/k 
(4) : 
1 1 ? k 1 k 
-- wt =\- VjLe 
a a/ik a/k 





ALGEBRAIC FORMS 


a 
i i 
= (s - *) (s » a; 2", 
asi a/k 
ji 
= (#- 15-5141) a,x", 
a a aa/sx 
i 
(3-3 = +1) 2,2", 
aa as 
i 
= (s - ) Ljt . 
a/k 
The striking similarity in the coefficients of the second and final expressions for 
the summations in (3), (4), and (5) should not be overlooked. 
Where we have multiple classification of the variables, we may operate on 
each index separately. For example, in a four-way analysis of variance we may 
have the quadratic form, 


Q= LD { Zin. — Xy.. — Tr. + %;...}?, 


1 1 11\1"°" ° 

2 | a( se eos +p 1)iT" toner} 
1 1 1 mnop 2 
={[a(3 7 i) ¢ - :) ah taue) 


-8( 6-1 Olmert 


The rank is one of the important properties of a quadratic form or matrix. 
An experienced mathematician usually has a rule of thumb for determining the 
ranks of those quadratic forms occurring in statistical analysis. In order to 
formulate such rules of thumb into a simple and rigorous algebra, the author 
here defines a type of matrix multiplication which he calls ‘“‘uncontracted matrix 
multiplication” and which he represents by the symbol, ©. 


Let A = || aj || and B = || 8; || be two matrices of any finite orders and with 
ranks R, and Rg. We define the uncontracted product, A © B, as follows: 


C= AOB 
= ||ai ||OB 
\| ai B || 
aiB aB 


oB atB 





FRANKLIN E. SATTERTHWAITE 


Thus the elements of C are 
vik = ai Bi . 


We therefore see that whenever we have a matrix whose elements can be 
factored in the above manner, then the matrix can be expressed as the uncon- 
tracted product of simple matrices. Thus, 


if Il vess-=" || = |] @PBF ++) || 
then lvii?-- || = lla ]Ol] BF [10 - 


We shall now prove that the rank of the uncontracted product, C = A © B, 
of two matrices is equal to the product of the ranks. This follows because for 
the matrix, A, there always exists a set of elementary transformations defined 
by the equations, 


T.: ah = (2)(5) anor a. , a #0, i =j, 


where the 62’s, i = j, are coefficients providing for the multiplication of the ele- 
ments of a row by a constant not zero; the 6}’s, 1 ¥ j, are coefficients providing 
for the addition to the elements of a row a linear function of the corresponding 
elements of the other rows; the 6’s are similar coefficients referring to columns; 


the symbol (?) is an operator indicating the interchange of the ith and jth 


rows (columns); and the 4é’s have the values, 
46; = 1, +=jc Ra, 
= 0, otherwise. 


This set of transformations reduces A to a diagonal matrix with R, non-zero 
elements. A similar set of transformations, 


; : P= l ts 
Ts: BoE = (‘) (;)¢ On Gr rh, 


exists for the matrix B. We next define two sets of transformations by the 


equations, 
a: (ail at) = (21) (7) snore ob, 


Ts: (48% pok) = pi wt (i) ¢ Pn Yr (48; ), 





ALGEBRAIC FORMS 81 


which are also elementary because of their relationship to T7, and T;. Now 
if we subject the matrix, C = || (a28;) || to the transformations 7’, followed by 
the transformations 7’, , it will be reduced to the diagonal form C = || (45/584) || 
with exactly RR, non-zero elements. Therefore, since the rank of a matrix is 
invariant under elementary transformations, the rank of C = A © B must 
be R aRs . 


We shall now determine the ranks of several matrices which occur frequently 
in statistics: 


Ay = |{1i|| = |/1,1,1,--- ||, B= 

Ao = |{15 || = [| del? |] = [ll Oll 17 Il, 
R. 1-1 = 1. 

As = |\8;||, Rs =a. 


1\’ | 
Ae = |(6- 2) ly Rg a—l. 
afi || 


The proof that Rg = a — 1 involves two steps. First summing the rows of A, 


we have, 
— ss f1i\3 j 
i ie ‘(*) =1- (‘) =0 
a/i a 


so that Ry < a — 1. Second if we subtract the elements of the first row from 


the corresponding elements of each of the other rows we obtain, 


2! 1 I 
Ay~ a a jt=! 
a af 21. 


Since the (a — 1)st order determinant in the lower right-hand corner is not 
equal to zero, Ry > a — 1. 


Applying our theorem on uncontracted products, the ranks of complicated 
matrices can often be determined by inspection. Thus: 


u=|a(+-1) | -uei 02h 
R; 

Ag 

Rg = (a — 1)(b — 1). 


w= H(e-9- orl Poll 


R; -l=1. 











82 FRANKLIN E. SATTERTHWAITE 






The Matrix A; may be confusing at first sight. Note that each element, «i , 
is a quadratic form in the y’s. This form is of rank 1 and can be factored into 
two linear factors, one independent of j, the other independent of 2. 

To illustrate the application of these techniques to a fairly complicated prob- 
lem, we shall construct and verify a design for the analysis of variance involving 
a regression line. It is known that sufficient conditions for such a design to 
be valid are: 

1. The sum of the quadratic forms be equal to the sum of the squares of the 
variables, and 


2. The sum of the ranks of the forms be equal to the number of variables. 
We shall use the first condition to set up our design. Thus, 


D2i; = [d8]i} tux”, 
“{e-e5-2e TE 
e-nT+[6-DG-Dee QQ] 
+ [64-20 Dro MQ)" 


Rewriting this in the usual notation, we have for our tenative design, 
Sai; = Bla — #. — Z.; + 2 + Del + Uz; — a? 
+ 2[(roz/oy)(ys — PP + Ze. — 2) — (ro2/oy) (ys — HY. 


In order to determine the corresponding equation for the ranks, we rewrite (6) 
in the form, 


= {(-3).6- 5), + GG), + @).6- 9), 
zat, = {(3 =) (3 5), + (3 Ab); Ta) ~ 6), 
i\ ,| 1\' 1\/1\ 
® + [(6~ 3). JL(~ 3) cas), 
1\" 1\" YY. f% 
#/e=)). (3 - 3) ~j)vu (ca) 5) jee 
First we must determine the rank of the unfamiliar matrix, 


= |(6-2) -(8-2)(6-2) vruvaci}| 


We see that the rank of As cannot be greater than a — 2 because two linear 
relations exist between the rows, namely, 


t 
lai = 0, since (0 _ :). = 0, 


+ 


(6) 


Rie 


(7) 








t 
u'ai = 0, since (s _- *) yy = acy. 
i 





ALGEBRAIC FORMS 83 


To show that the rank of As cannot be less than a — 2, we subtract the elements 
of the first row from the corresponding elements of each of the last a — 2 rows, 
giving, 

1 2 i 
As ~ ay | ay | ay 


a a es | ee 
aj — ay oo t LY’ (5! : 
—(5 — -) yi — diy b ys; — didye 

a/s BE nce Danneel temrerggeremneemnenennen 


a 
| acy | acy 


i#1,2 


i 2 
Multiplying each element of the second column by -(3 _ :) y’. J (s _ 1) y" 


ase 
and adding the result to the corresponding element of the jth column for j = 3, 
4, +--+ a, we see that the (a — 2)th order determinant in the lower right-hand 
corner becomes | 82 | which is not equal to zero. Therefore the rank of As must 
equal a — 2. 
Referring to equation (8), we now write down the corresponding equation for 
ranks using the theorem on uncontracted products. Thus, 


> Ranks = (a — 1)(6 — 1) + (1I)(1) + (DO — 1) + (1)0)Q) + @ — 2)(1), 
= ab. 


Hence the quadratic forms in the right member of equation (7) are mutually 
independent and each, measured in units of the variance of the population, is 
distributed as is Chi-square with the appropriate number of degrees of freedom. 





A SYMMETRIC METHOD OF OBTAINING UNBIASED ESTIMATES 
AND EXPECTED VALUES 


By Paut L. DREssEL 
Michigan State College, East Lansing, Michigan 


The problem of finding the relationship between moment functions of a 
sample and moment functions of the population from which the sample was 
obtained has, of necessity, received much attention. The problem has two 
parts: first, to find the expected value of a given sample moment function; 
second, to find the estimate of a given population moment function. Thus, if 
m; represent the 7th central moment of a sample and yu; represent the 7th central 
moment of the population, the first part of the problem requires that we find 
the mean value of m; for all possible samples of a given size. and express it in 
term of the u;’s. The second part requires that we find a function of the m,;’s 
such that the mean value, taken for all possible samples of a given size, be a 
given u:. For the case i = 4 we have the well known results: 


_ (n — 1I)(n’ — 3n + 8) 3(n — 1)(2n — 8) 2 
ne Ma + ee ee be 


E{m] 





’ 


n(n? — 2n + 3) 3n?(2n — 3) 


n n 


E [ual = me. 
These results are based on the assumption of an infinite population. In spite 
of the inverse relationship existing between estimates and expected value, the 
expressions above show no simple relationship. This lack of simplicity of rela- 
tionship between estimate and expected value is directly traceable to the fact 
that such results are usually obtained for infinite populations. When results 
are obtained for finite populations a symmetry is found to exist which reduces 
to a single problem the two parts stated above. Since this should be evident 
to anyone upon reflection, the main purpose of the present paper may be con- 
sidered as that of indicating one method of demonstrating the result stated 
above as well as showing relationship of this method to material appearing in 
previously published papers. 

Consider a finite population consisting of N items x; --- zy and samples of n 
items taken from that population, the sampling being done without replacement. 
We shall utilize the power product notation of P. S. Dwyer [1; p. 13] 


n 


(1) -** Gr) 7 ge ..- Ff 
iyige: ++i, 
84 





OBTAINING UNBIASED ESTIMATES AND EXPECTED VALUES 


to represent a power product formed for the sample and 
N 
(2) In---al= 2 ath --- af 
i tphiggh> + + xi, 

to represent like power products formed for the population. An arbitrary 
moment function of weight r of the sample is indicated by 

r! 7 ie 
(3) Dagrt... gr int” -.- GO” al ->- wet (qi)! «++ (q,)”* 


and likewise a moment function of the population is indicated by 


r! ‘ 
(4) DA qt... ort ine ..- tolled <<. al fail™! --+ [qel]** 


where the summation extends over all partitions of r. 

It now is convenient to express each of the expressions (3) and (4) in terms 
of power products. We shall utilize for this purpose an expansion theorem 
which is the converse of a theorem stated by Dwyer, [1; p. 34] and .[2; pp. 37-39], 
which can be proved in a similar fashion. 

This converse theorem follows: 
If any isobaric sum of products of power sums indicated by 


! 
(8) BAe Gas dda coe wl ON aa” «++ Cad 


be expanded in terms of power products in a form indicated by 
r! 


(6) ZBor1...p% [pi* --+ pe’) 


 (pi!)"! «++ (pel) "* ai! +++ ae! 
then the coefficient B, of the power sum [r] is given by 

r! 
r= = (pi!)"*}- a (p.!)"* a! as tae ete 
and the coefficient B,,...r,, of [Tie +++ Tm] is 


(8) Briry-tm = Br,Bn +++ B 


Tm 


(7) B 


where the barred product indicates a symbolic multiplication by suffixing of sub- 
scripts. 
This is exemplified by 


Bw = BsB, = (As + 34n + Am)(A2 + An) 
=— Ase + Azu + 3Ao1 + 4Aon + Aum. 
Using this theorem the moment functions (3) and (4) are easily expanded in 
terms of power products. In this latter form the expected value of the sample 
moment function is easily found by utilizing the fact that 


B(@ vee a) = Im -++ a) 


n® N® 





386 PAUL L. DRESSEL 


Now if the expected value of the sample moment function be equated to the 
population moment function (both being in power product form) we obtain a 
set of equations connecting the coefficients of a sample moment function and a 
population moment function. Since either the coefficients of the sample mo- 
ment function or those of the population moment function may be assigned . 
and the others solved for, this set of equations enables one to solve two problems. 
First, we may find unbiased estimates—moment functions of the sample such 
that their expected value is some preassigned population moment function. 
Second, we may find expected values—moment functions of the population such 
that they are expected values of some preassigned sample moment function. 
From the symmetry of this set of equations, we shall see that any result ob- 
tained from the system has, through the symmetry, a dual role. 

The foregoing discussion may be clarified by an example. Let Ao[2] + Au[1]’ 
be the population moment function. In terms of power products this becomes 
(Az + Aus)[2] + Anf{11]. The sample moment function a2(2) + a1(1)’ becomes 
in terms of power products (a2 + ay:)(2) + an(11) and its expected value is 
n? 
NO 
By equating this to the population moment function above we obtain 

nan _ N?An : 
n(d2 + ay) = N(A2 + An), 


and the symmetry of the system is apparent. 
If 


wy (a2 + au) [2] + 


ayi{11]. 


— Nn” ad 4 


“* oS 
the solutions of the system are 
@) Qu = 72An, An = pau , 
dz = 1A2 + (11 — 72)An, Az = pide + (p1 — po)adu. 
In a similar manner if we use moment functions of weight 3 we begin with 
A3[3] + 3Aa[2][1] + Ain)’, 
a3(3) + 3an1(2)(1) + ain(1)’, 
and obtain the system of equations 
nan = N?Am 


n” (ae + Qin) = N® (An + Aj) 
n(a3 + 3a + ain) = N(As + 3An + Ain) 





OBTAINING UNBIASED ESTIMATES AND EXPECTED VALUES 


with solutions 
| Aim = pain , 
(10) Az, = pdm + (p2 — ps)am, 
As = pds + 3(p1 — pe)der + (o1 — 3p2 + 2ps)amn . 


The solutions for the a’s in terms of the A’s are obtainable from the given results 
in an obvious manner. 


If we use the Carver functions [3; p. 104] 
P=p, Pu = pm “es Py = pi, 


P, = pi — pr, Po = p2 — ps eee 


Ps = pi — 3p2 + 2ps, Poe = p2 — 2p3 + pase 
P, = pi — 7p2 + 12p3 — 64, 


or in general 


‘ Peace Rianne 
a Pe= Zee Qi (-D Gor. Qn a 
and 


} en = | fe ee P,, 


where the double barred product indicates a symbolic multiplication by addi- 
tion of subscripts exemplified by 


Px = PsP2 = (1 — 3p2 + 2ps)(o1 — pr) 
= pa — 4ps + 5p, — 2ps ; 
the results (9) and (10) may be written 
An = Puan, As = Pya3 + 3P2am + Psain , 
Az = Pia, + Pran, Aa = Pydn + Padi , 


Am Pyidin . 
Similarly for weight 4 we obtain 


Ag Pras + 4P2ax, + 3P2am + 6Psamn + Pian , 

An = P1031 + 3Padn + Prawn , 

Az Pyde + 2Pxan + Padi , 
Pynd@n + Penain , 


Aun = Pyndaun . 











88 PAUL L. DRESSEL 


In general e 
r! 


18) A, = ZPevieeto-tee GPG «<> (pA) ml «>> wel 


Apt 1py2- ° “ps ’ 
and 


(13) hii suse Ms 


Tm? 


where as before the barred product indicates a symbolic multiplication by 
suffixing of subscripts. 





If in 
r! 
oo . a eee *t 
(14) ZAqri...grt (q:!)"! ae (q:!)*! ™! ae m! (qu) (az) 
(= 1) ry + ae + ee +m — 1)! 
(15) Agit...grt = By trot---+r;e 





n 


the moment function of the sample which is thereby represented is the Thiele 
seminvariant /, of the sample. If the A’s are solved for by means of the appro- 
priate set of equations the expected value of J, is found. Thus we find 


2 2) 
N’n' 


Ell] = aa) 
E\ls] = aah. 
Ell.) = ee ts on ante — N)(Nn — 6)ks, 
16 
™ rity = Tat — NE (n - NAN — 9 — NN — 1)u, 
E(ls] = oso + i (n — N)(Nn — 12)ks, 
E{l;l.] = en — es (n — N)(Nn — n — N — 5)ks, 


where the x system of seminvariants used here is defined by 


1 2r ‘ 2 
Kor = 3 7. (—1) (77) wines 


i=0 


Korg = i (-( 2r ) 22 + 1 


itr/ite pie 


(17) 


By virtue of the symmetry noted earlier it follows that the estimates of the 
Thiele seminvariants and products of these seminvariants of weight < 5 are 











OBTAINING UNBIASED ESTIMATES AND EXPECTED VALUES 89 


obtainable from the last results by replacing E by E’ (estimate of), «x; by k; 
l; by A;, and N by n. In this manner we find that L, , the estimate of dx is 
n'N. (4) N” n? 


(18) Ly = E'(M) = aa + yaw 


(N — n)(Nn — 6)kg. 

It is of some interest to note in the results (16) above that in those expected 
values or estimates which contain more than one term the factor N — n occurs 
in the second term. This, and the form of other coefficients involved in the 
terms, shows that as the sample size approaches the population size the sample 
seminvariants approach the population seminvariants. Another characteristic 
of such results as those given in (16) is that infinite sampling formulas are easily 
obtainable therefrom. Thus if in Ll, given in (18) N — o, we find 


In 


a 3 
n n 
@ 4 t oe hs 


a n'(n + 1) a 3n*(n —1) 2 


n@) n® —e 


the first of these forms checking the result given by Dressel [4; p. 45] and the 
second form being identical with that given by Fisher [5]. 

The results exhibited above for finite sampling may lead to a mistaken idea 
about the simplicity of the results. Simplicity decreases rapidly as the weight 
increases. Thus for weight 6 we find 


Nn® (6) 2N* (4) 
Elle) = rare 88 + Space (m — N)(Nn — 20)[8u6 — 15pane + 10us — 45y3] 
N?n 
(19) eas -[11y6 + 105p4y2 — 50 ws + 60p3] 
- rang (n —-N)[Nn(N? + nN +72) — 14nN(N +n) + 71Nn — 120] 
10Nn® 6n an 
+ Wont (n —N) + yas  — NIN +0 — 5) — Gay — N) Ke. 


Again by letting N — o infinite sampling results are obtained. Much of this 
last result vanishes in that case. 

It has been demonstrated that the x system of seminvariants are invariant 
under estimation in the case of infinite sampling [4; p. 53]. It is therefore of 
some interest to note that this system also possesses the property for finite 
sampling without replacement. The proof of this is quite simple. Denote the 
estimate of x; by K; and the fundamental relations are 

2 3 
Ko =” Kor, Kora1 = a Kory 


= 7@ 








90 PAUL L. DRESSEL 


These expressions hold for any n and hence for a population of N. Let K2, and 
K2,4, denote functions corresponding to Ke, and Ke,4; but with population 
moments replacing sample moments and we have 


N’ 


3 
’ ’ N 
Ko, = Sa Kors a 


N® Kor41. 

Since the power product mode of formulation of Ko, and Ke,+; insures that 
E{Ker] = Ke, — ElKeru] = Kors 

it follows that 


n , N’ 
E|Ke,] — e| ke = Ko, — N® Kory 
or 
n” N? 
Eker] = nN® Kor. 

Similarly 
(3) 3 
Elker+1] = 7 - 


thus establishing the theorem stated above. 


REFERENCES 


{1] P. S. Dwyer, ‘“‘Combined expansions of products of symmetric power sums and sums 
of symmetric power products with applications to sampling,’’ Part I, Annals of 
Math. Stat., Vol. 9 (1938), pp. 1-47. Part II, Vol. 9, (1938) pp. 97-132. 

[2] P. S. Dwyer, ‘‘Moments of any rational integral isobaric sample moment fun¢sion,’” 
Annals of Math. Stat., Vol. 8 (1937), pp. 21-65. 

[3] H. C. Carver, “Fundamentals of the theory of sampling,’’ Annals of Math. Stat., 
Vol. 1 (1930), pp. 101-121; 260-274. 

[4] P. L. DressgEt, ‘‘Seminvariants and their estimates,’’ Annals of Math. Stat., Vol. 11 
(1940), pp. 33-57. 

[5] R. A. FisHer, ‘Moments and product moments of sampling distributions,’’ Proc. 
Lond. Math. Soc., Vol. 2 (30), (1929), pp. 199-238. 














DETERMINATION OF SAMPLE SIZES FOR SETTING 
TOLERANCE LIMITS 


By S. S. WiLks 


Princeton University, Princeton, N. J. 


1. Introduction. In the mass production of a given product or apparatus 
piece-part, Shewhart’ has discussed a practical procedure for detecting the exist- 
ence of assignable causes of variation in a given quality characteristic of the 
product as measured by a variable x. For example, x may be the thickness in 
inches of a washer or the tensile strength in pounds of a small aluminum casting 
made according to a given set of specifications; x varies in value from washer 
to washer or from casting to casting. Now suppose assignable causes of vari- 
ability in z have been detected by Shewhart’s procedure and have been suffi- 
ciently well eliminated by making appropriate refinements in the manufacturing 
process so that for all practical purposes the remaining variability may be con- 
sidered “random,”’ thus allowing us to assume that we have a statistical universe 
U in which z is a random variable with some distribution law f(z). f(z) is, in 
general, unknown and cannot be determined until long after the refined manu- 
facturing operation has been under way. Two types of situations arise in prac- 
tice, one in which z is a discrete variable taking on only certain isolated values 
as for example 1, 2, 3, --- , ete. with corresponding probabilities p(1), p(2), --- , 
the other being that in which z is essentially a continuous variable over some 
range with a corresponding probability density function f(z). In this paper we 
shall consider the latter type of variable. 

The problem now arises as to how we should calculate a tolerance range 
(L, , Lz) for x from a sample, and how large the sample should be in order for 
the tolerance range to have a given degree of stability. More specifically, for a 
given method of calculating tolerance limits, how large should our sample be in order 
that the proportion P of the universe included between L, and Lz have an average 
value a, and will be such that the probability is at least p that P will lie between 
two given numbers, say b and c? For example, if a tolerance range is obtained 
by using a truncated sample range, that is by letting L, be the greatest of the r 
smallest values in a sample and Lz the smallest of the r largest values, r being 
chosen so that E(P) = .99, how large should the sample size, say n, be in order 
for the probability to be .9 that P would lie between .985 and .995? A similar 
question can be asked when the setting of only one tolerance limit is under 
consideration. 


1W. A. Shewhart, Economic Control of Quality of Manufactured Product, D. Van Nos- 
trand Company, New York, 1931. 


91 





























92 Ss. S. WILKS 






2. Tolerance ranges from truncated sample ranges. Suppose that nothing is 
known about the distribution function f(z) except enough to enable us to assume 
that itis continuous. Leta be the average value which P is to have, and suppose 
a sample of size is drawn from the universe U’ so that [(1 — a)(n + 1)]/2 = 7, 
say, is a positive integer. Let 2,22, --- , 2, be the sample values of x arranged 
in order of increasing magnitude. Let L; = z, and Le = 2,_,41. The distribu- 
tion law, say g(P) of P the proportion of the universe included between these 
values of ZL; and Lz is given by 


T'(n + 1) a(n+1)—1 (1—a) (n+1)—1 
1 P) dP = ——___————_~., P 1-—P dP. 
(1) g(P) Tain + DITIG — a(n + 1] ( ) 
This follows at once from the joint distribution law of x, and z,_,4; which can be 
derived as follows: Consider the z axis as being divided into k mutually exclusive 
intervals I,, Ig, ---,J, with pi, po, --+, pe as the associated probabilities 


k 

(= ni = 1). In a sample of size n the probability that nm, m,--- ,m% 
1 
k 


(= n= n) values of x will fall into J,, I2,--- , J, respectively is given by 
1 


the well-known multinomial distribution law 


n! n n n 

(2) ~<a. ‘ 

To get the distribution of z, and 2,_,4; we take k = 5 and for J), Ie, ---, Is 
we take the intervals (—, 2,), (2, 2 + dz,), (4 + dt, Ln—r41), (Ln—r4s, 
Ln—rgi t AXn—r41), (Cn—rg1 + AXn-r41, ©) respectively. The values of p1, po, ---, 
ps are the integrals of f(x) dx over these five intervals respectively and the values 
of m,, m2,---,N, arer — 1,1, — 2r,1,r — 1 respectively. By substituting 
these values of the p’s and n’s in (2) and neglecting terms of order higher than 
dz, dx,-+4, the probability element for z, and 2,_,41 is found at once to be” 


e=oRaam (Lee) ([) toa) 


tn—-r+1 


(3) famr+ n—2r 
. (/ f(x) iz) f(xy) f(tn—r41) dx, Atn—r41 . 


r 


Now let [ " f(z) dx = u, / * f(x) dx = v, then since du = f(z,) dz, and dv = 


Zn—r+1 


—f(Xn—r41) AXn-r41, the probability element of u and v may be written as 


T(n oe 1) —] n—2r 
(4) T%(r)r(n — 2r + 1) u ‘y me! -— v) du dv, 
2 For a discussion and a rather complete bibliography of the probability theory of ‘‘ex- 


treme values”’ such as 2, and 2n_74; see E. J. Gumbel, ‘‘Les valeurs extrémes des distribu- 
tions statistiques,’’ Annales de l’Institut H. Poincaré (1935). 














TOLERANCE LIMITS 93 


the region of w and v of non-zero probability being the triangle bounded by the 
u and v axes and the line u + v = 1. Making the change of variables 
1—u-—v =P and u = Q, integrating with respect to Q, and setting r = 
(1/2)(1 — a)(n + 1) we find the distribution of P, the proportion of the uni- 
verse included between z, and 2,_,4; to be (1). It should be remarked that even 
if L; and Le are obtained by asymmetrical truncation by taking L; = z,, Le = x; 


Zt 
where ¢ — s = n — 2r + 1, the distribution of P = f(x) dx remains unchanged. 


Thus for a given p, by taking L; = x, and L, = x, where t —s=n—2r+1= 


a(n + 1), and choosing the smallest value of n for which [ g(P)dP > p 
b 


and such that (1 — a)(n + 1) is a positive integer we have provided the answer 
to the italicized question for one method of calculating L,; and L2; a method 
which is valid for any unknown continuous distribution f(z). 

As an example, suppose we take a = .99, b = .985, c = .995 and p = .99. 
The size of sample required is found to be 1000 (999 to be exact). In fact in 
this case the probability of P being between .985 and .995 is .992. In this 
example, we may therefore make the statement that if x is a continuous variable 
under statistical control, and if samples of size 1000 are taken, the tolerance 
limits L; and Ll, taken as the fifth smallest and fifth largest values of z in the 
sample respectively, will, on the average, include 99% of the universe between 
them and furthermore, the tolerance limits calculated in this way for samples 
of size 1000 will, in about 99.2% of the samples, include between 98.5% and 
99.5% of the universe between them. 

If ZL, and Le are taken as the smallest and largest values of x in the sample 
respectively (corresponding to r = 1, i.e. sample range with no truncation), 
then in samples of size 1000, these tolerance limits will, on the average include 
99.8% of the universe between them and the probability is .996 that L, and Le 
will include at least 99.5% of the universe between them. If the largest and 
smallest values of x in samples are used as tolerance limits and if we wish to 
state that the probability is .99 that such tolerance limits will include at least 
99% of the universe, the size of sample required is 660. If the probability is 
lowered to .95 of including at least 99% of the universe, with such tolerance 
limits, the size of sample required is 130. Engineering statisticians’ have 
pointed out on basis of practical experience the need of using samples of 100 to 
1000 on even more cases in order to set tolerance limits which will include at 
least 99% of the universe with a satisfactorily high degree of certainty. The 
examples we have given based on sizes 1000, 660 and 130 will indicate the degree 
of stability to be expected for tolerance ranges for samples in this range of sizes. 
The degree of stability of the tolerance limits for samples of the size range 500 
to 1000 appears to be of about the order of that demanded by the engineering 
statistician. 


3Cf. W. A. Shewhart, Statistical Methods from the Point of View of Quality Control, The 
Gaduate School of the J.S. Department of Agriculttre, Washington (1939). P. 63. 












94 Ss. S. WILKS 






In some cases it may be desirable to determine the size of samples so as to 
control the tolerance limits L; and Lz individually, that is so that the probability 
is at least p that the proportic s of the universe contained in the tails of the 
distribution cut off by Z; and Ll» are in both cases between two given numbers, 
say d and e. In this case we would determine the least value of n so that 


(5) f [ r, v)dudv > p 


where h(u, v) du dv denotes the function given by (4). For example, suppose 
“p = .99,d = 0,e = .005. r=1. The size of the sample needed is 1060. 
Thus in samples of size 1060, the probability is .99 that LZ, and Le taken as 
the smallest and the largest values in the sample respectively will cut off tails 
of the universe such that each tail will include not more than 0.5% of the universe. 

If it is desired to set only one tolerance limit, say Z,, then the distribution 
of uw would be used. This can be found by integrating (4) with respect to v. 
The distribution is 


(6) T(n + 1) 

T(r) (n — r + 1) 
The probability p that the proportion of the universe in the tail which will be 
cut off by L; is between d and ¢ is given by integrating the expression (6) from 
d toe. The value of n required to obtain any given value of p can then be 
determined. For example, in the case where p = .99,d = 0,e = .005, r = 1, 
the size of the sample needed is 920. 


u (1 — u)”” du. 





3. Tolerance range for a normal universe. The method of setting tolerance 
limits discussed in Section 2 assumes nothing about the distribution f(r) except 
that it is continuous. If f(x) can be assumed to have a given functional form 
involving unknown parameters, methods based on the theory of statistical es- 
timation and having greater efficiency than those already discussed could be 
used for setting tolerance limits. We shall not go into a general discussion of 
such methods here although it does appear desirable to consider one very im- 
portant example of the application of the methods. Suppose f(x) can be assumed 
to be a normal distribution function with unknown mean m and variance o’. 


In a sample of size n let @ be the sample mean and let s’ = >> (x; — #)*/(n — 1). 
1 


Let us consider as tolerance limits L; and L; the quantities + ks. The pro- 
portion P’ of the universe included between these limits is 


(7) Pp = [ 0 Heme a 
VV 2r a Sets 


We wish to determine k so that E(P’) = a. It can be verified by straight- 
forward analysis that E(P’), defined by [ | P’f(%, s) ds dz, has the value 
—oco 40 








T'(n/2) . dx om n 
9 Vaw =) Lazear ** k4/ ssi 














TOLERANCE LIMITS 


where f(Z, s) is the well-known distribution of @ and s given by 


i a/n(n a oer eo Bln (2m) 2+ (n—1) 27] /03 


2a" xT ((n —.1)/2) 


Therefore the tolerance limits L; and Lz which will include, on the average, 
a proportion a of the universe between them are 


(10) E42 taV (n + 1)/n-s 


where ¢, is the value of ¢ for which the integral in (8) has the value a. The 
value of ¢, can be found from Fisher’s t-table for n — 1 degrees of freedom, and 
for certain values of a including .99, .95, etc. and for values of n up to 30. Al- 
though the tolerance limits (10) will include, on the average, the proportion a 
of the universe between them, we must now investigate the size of sample 
needed to obtain a given degree of stability of P’. The exact distribution of P’ 
seems to be too complicated to be of any practical value. It is not difficult to 
verify that to within terms of order 1/n, the variance of P’ is given by 


(11) op = Re 's/(en). 


The variance of P, the proportion of the universe included between z, and 
Zn-r+1, to within terms of order 1/n is given by 


(12) op = a(l — a)/n. 


ror a large sample of a given size, say » = 100 or more, a simple comparison 
of the stabilities of the two tolerance ranges (z,, Zn—r41) and (+ t.+/(n + 1)/n-s) 
can be made by comparing op and op. Fora = .99, the efficiency ratio o>-/o> 
is .28 indicating that for large n and when the universe is normal, samples of 
size .28n have the same degree of stability in setting tolerance ranges (10) as a 
sample of size n has when (2,, Zn_,+1) is taken as the tolerance range. The same 
thing may be viewed in another way: The fact that the range of values of P’ is 
0 to 1 suggests that we may be able to get a fairly close approximation to the 
true distribution of P’ by fitting a Pearson Type I function of the form 


T(a + 8) ra—-lvy ss pr) B-1 
(13) rare) > (1—P’)*", 


determining a and 6 by equating the mean and variance of the distribution (13) 
to the mean and variance of P’ respectively. Accordingly we find 

a = [a°(1 — a) — aop)/op 

8 = [a(1 — a)* — (1 — a)op'l/o>. 
Thus it will be seen from (14) that in order for the fitted distribution (13) to be 


2 —t3 
identical with the distribution (1) a sample of only ata (n + 2) cases is 
needed. 


In case only one tolerance limit is to be set, e.g.  — tar/(n + 1)/n-s, the 


(14) 











96 S. S. WILKS 





proportion, say w’, of the universe which will be included in the tail has mean 


2 
value (1 — a)/2 and variance =o e ‘« (approximately) for large n. The 
a 


° ° . ° ° ° 2 

ratio of this variance to that of u, which is approximately (1 — a’)/4n for 
large n, gives the efficiency of using 2, for the lower tolerance limit in case of a 
normal universe. For example, if a = .99, the efficiency is .18. 


It is perhaps appropriate here to point out the distinction between confidence 
limits and tolerance limits. It is well-known that in a sample from a normal 
universe with mean m the probability is a that the confidence limits @ + tas 
will include the population mean m between them. The tolerance limits 
E+ tiv/(n + 1)/n-s, on the other hand are used to estimate the middle 100a% 
of the universe. Although the tolerance limits @ + ta~/(n + 1)/n-s are much 
more stable for a given sample size than those given by z, and %n_,41, in case 
of a normal distribution, it should be emphasized that in case of even slight 
non-normality, particularly when skewness is present, the former pair of limits 
are apt to give very erroneous results with reference to the proportion of the 
universe included in the tails. Confidence limits estimating m are probably 
much less sensitive to skewness than tolerance limits estimating the middle 
100a% of the universe, particularly when a is nearly unity. 

Another important aspect of the problem of setting tolerance limits is the 
following: Suppose small samples of a given size are taken from a universe 
under statistical control. How many of these small samples should be taken 
as a basis for determining tolerance limits L; and Le of some function, say g, 
of the samples (e.g. the sum of the measurements in each sample) so that the 
proportion of samples in the universe of such samples having values of g between 
I, and Lz will have a given mean with a given degree of stability? One obvious 
approach to this question is to consider a universe of samples in the same manner 
in which we have considered a universe of individuals throughout the present 
paper. This approach, however, does not make very efficient use of the observa- 
tions, but we shall not enter into a treatment of the problem here. This problem 
and various related problems in the statistical methods of mass production 
remain to be studied. 


4. Summary. A method based on truncated sample ranges for determining 
size of sample required for setting tolerance limits on a random variable x having 
any unknown continuous distribution f(x) and having a given degree of stability 
is given. A method for setting tolerance limits corresponding to a given degree 
of stability in case f(x) is normal is discussed and a comparison of the stabilities 
of the tolerance limits set by the two methods in the normal case is made. 
Illustrative examples of the methods are given. 





ON A CERTAIN CLASS OF ORTHOGONAL POLYNOMIALS 


By Frank S. BEALE 
Lehigh University, Bethlehem, Pennsylvania 


Introduction. E. H. Hildebrandt has demonstrated the following theorem’: 
If y ts a non-identically zero solution of the Pearsonian Differential Equation, 


bo + biz + bez? = a; , b; real, then 


n 


—_ (Dy) = P,(k,z), n,k integers, n > 0, isa 


polynomial in x of degree n at most. Hildebrandt has obtained various relations 
connecting the P,(k, x) and their derivatives as well as a recurrence relation. 

If in (2) we set kK = n there results from a proper choice of N and D in (1), 
the classical Hermite, Laguerre, Jacobi and Legendre Polynomials. Many 
properties of these classical polynomials have been obtained by numerous 
investigators.” 

One of the most important of these properties is that of orthogonality which 
can be stated as follows: Consider a sequence of the classical polynomials ®;(x) = 

a t+ .... There exists an interval (a, b) finite or infinite and a unique 
weight function y(x), monotonic non-decreasing over (a, b) such that, 


(3) [ ” Ga(2)6_(2) dpa) = 0, ora oe, 


In the future we will refer to the type of orthogonality given by (3) with ¥(x) mono- 
tonic non-decreasing as orthogonality in the restricted sense. In order to determine 
whether a given system of polynomials is orthogonal in the restricted sense we 
have the following theorem:* 

TuEoreM 1. In order that the sequence of polynomials (x) = x — Sj** + 


1 E. H. Hildebrandt, ‘Systems of polynomials connected with the Charlier expansions, 
etc.,’’ Annals of Math. Stat., Vol. 2(1931), pp. 379-439. 

2 For an account of these properties as well as an extensive bibliography the reader can 
refer to one of two treatises viz.: J. Shohat, Théorie Générale des Polynomes Orthogonauz de 
Tchebichef, Memoriale des Sciences Mathématiques, Fascicule 66, Paris, Gauthier Villars, 
1936. 

Gabor Szegé, Orthogonal Polynomials, Am. Math. Soc., Colloquium Publications, Vol. 
23, 1939. 

3 J. Shohat, ‘‘The relation of the classical orthogonal polynomials to the polynomials of 
Appell,’”’ Am. Jour. of Math., Vol. 58(1936), pp. 454-455. 


97 





98 FRANK S. BEALE 


-+-,@ = 1, 2,3, --- with real coefficients be orthogonal in the restricted sense it is 
necessary and sufficient that there exist a recurrence relation, 


(4) (2) = (x — c)®u(z) — Ai2(z), & = 1, ®%=2r-—c, 


c:, Ax const. with all \; > 0,7 > 2. 

With Shohat* we will say that a system of polynomials ®;(z) = 2° — Syr*? + 

-,t = 1, 2, 3,--- , with real coefficients is orthogonal in the general sense if 
there exists at least one weight function y(x), of bounded variation over (a, b) such 
that (3) is satisfied. In connection with generalized orthogonality we have the 
following theorem :* 

THEOREM 2. In order that the system ®,(z), 1 = 1, 2, 3, --- be orthogonal in 
the general sense it is necessary and sufficient that relation (4) be satisfied with all 
A =~ 0. 

It is the purpose of this paper to investigate the orthogonality properties of 
the general polynomials P,(n, x) given by (2). In Part 1 a general recurrence 
relation is derived which applies to all the polynomials P,(k, x). In Part 2 all 
the different types of orthogonal polynomials P,(n, x) are determined by making 
use of the general recurrence relation derived in Part 1. We also show, follow- 
ing lines laid down by Hahn’, that the only systems of polynomials with simple 
zeros which are orthogonal in either the restricted or the general sense and whose 
derivatives are orthogonal in either sense are the systems considered in Part 2. 


1. The general recurrence relation. From (2) we can write, 


(5) P,-k, z) = 


pre a Dt 7 apes _s 
y dz" = y dz"™— 


[D-D**y). 


Apply Leibnitz Formula to the right side and make use of (2). There results, 
P,-alk, 2) = Paailk — 1, xz) + (n — 1)D’P,_.(k — 1, 2) 


6 a a 
(6) + ——— D”DP,3(k — 1, 2). 


From Hildebrandt’s paper we have,’ 
(7) Paulk +1,2) = [N+ (K+1)D’P,.(k, z) + n[N’ + (A+ 1)D’|DP,_1(k, z). 


Decrease k and n each by one in (7) and obtain a relationship which we number 
(8). Again decrease n by one in (8) and get a relation which we number (9). 


‘J. Shohat, ‘‘Sur les polynomes orthogonaux généralisés,’?’ Comptes Rendus, Vol. 207 
(1938), p. 556. 

’ Wolfgang Hahn, ‘“‘Uber die Jacobischen polynome und zwei verwandte polynomklas- 
sen,’’ Math. Zeits., Vol. 39(1934~-35), pp. 634-638. 

* EK. H. Hildebrandt, loc. cit. p. 407. 





ORTHOGONAL POLYNOMIALS 99 
From (6), (7), (8) and (9) eliminate P,_;(k, x), Pro(k — 1, x), and Pas(k — 
1, x). There results, 
(10) [2N’ + (2k —n+ 1)D”[N’ + kD’ Pau(k + 1, x) 
= {[2N’ + (2k — n+ 1)D"IIN’ + kD"IIN + (k + 1)D'} 
+ n[N’ + (k + 1)D’[2N'D’ + kD’'D” — ND'}P,(k, x) 
+ n[N’ + (k + 1)D"{2(N’ + kD”)’D 
— (N + kD’)(2N'D’ + kD’D” — ND")}P,A(k — 1, 2). 


In (10) decrease n and k each by one and replace N and D by their values from 
(1). Thus we get, 


(11) [ai + (2k — n)be][ar + 2(k — 1)be]P,(k, 2) 
= {[a, + (2k — 2)be][a, + 2kbe][a; + (2k — 1)ba]z 
+ [ai + (2k — 2)be][ai1 + (2k — n)be][ao + kbi] 
+ (n — 1)[ai + 2kbejfaibi + (& — 1)bibe — aobe]}Paa(k — 1, x) 
+ (nm — 1)[a, + Qkkbe] {dolar + (2k — 2)bsI" 
— [ao + (K — 1)biJ[aibi + (K — 1)bibe — aobe]}Pro(k — 2, x). 


In this recurrence formula the P,(k, x) have in general a coefficient of x” dif- 
ferent from one. Polynomials which have one for the coefficient of x” we will refer 
to in the future as normalized. Let us now transform (11) for normalized P,(k, x). 
Theorem 1 deals with polynomials normalized in the above sense. Let us write, 


P,(k, 2) = dnat” — bz" +.---. In (4) set, (2) = Palk, z)/ane. 
Thus we get, 
(12) P,(k, x) = (Ant — Ba) Paalk — 1, 2) — YaPn—o(k — 2, 2) 
where 


Relation (12) is essentially of the same form as (11). Each of these is to be 
reduced to form (4). 


From a previous paper by the author’ we have, 
(13) Prsilk, 2) = (n + 1)[N’ + 3(2k — n)D"P,(k, 2). 


n — 1 successive applications of this relation give us, [Po(k, x) = 1], that the 
coefficient of x” in P,(k, x) is, 


7 Frank S. Beale, ‘‘On the polynomials related to Pearson’s differential equation,” 
Annals of Math. Stat., Vol. 8(1937), p. 207 (2). 





100 FRANK 8S. BEALE 


n—l 


(14) An,sk = I] [a + (2k —n+1+2)b]. 
By employing (14) in (12) we see that (12) or (11) reduces to form (4) where, 
as al [a,: + (2k — n)be][ao + kb,] 
. [a, + 2kbe][a: + (2k — 1)be] 





(15) 
os (n — 1) —n + & — 1)bib2 — adobe] s 
[a, + (2k — 1)be][a, + (2k — 2)be] 


[ay + (2k eo 1)be] {bo [a + (2k _ 2)b.]” 
as [a + (k bie 1)b;][ay bi + (k—- 1)bib, — agbe) } 
(a, + (2k — 3)be][a, + (2k — 2)be)"[a1 + (2k — 1)be] 


Equation (16) together with Theorems 1 and 2 ¢an now be applied to the poly- 
nomials P,(k, x). 
From (14) it is seen that P,(k, x) is of degree n provided that none of the factors — 
of the product vanishes. This condition we assume to hold here for all n. 
We can now obtain a recurrence relation for the qth derivatives of P,(k, x). 
A repeated application of (13) leads to, 
- qe 


a7) © PG, 2) = Peak, 2) T] o — 9 fm + Ok — n+ i+ Dba, 





where P,,(k, x) is not normalized in the above sense. By considering the right 
side of (17) together with (14) we see that (17) can be divided by 


q—1 
On—a.k I] (n — i) [ar + (2k —n +7 + 1)br] 


and thus normalize the polynomials on both the right and left sides of (17). 
Consequently the recurrence relation for normalized d‘[P,(k, x)]/dz*, n = 
0, 1, 2, ---, is identical with the recurrence relation for normalized P,_,(k, x) 
as given by (4), (15) and (16) when we replace n by n — q in these latter. 


2. The different types of orthogonal P,(n, x). Suppose first that b.2 ¥ 0 in 
(1). A transformation on z with real coefficients can be affected which changes 
(1) into either, 


(18) 1dy _ (a — 8) + (~a ~ Az 


y dx 1-2 ” 


ldy _ —2mz—q 
(19) yar 
(A) Equation (18) together with (2) for k = n defines the generalized Jacobi 
Polynomials (normalized in the above sense), 
Aataa-2* Sa +20 - 2)" 


Ann dx” 


I Aa, a, B) = 





ORTHOGONAL POLYNOMIALS 101 


where 1/a,,, is given by (14). If in (16) we set k = n and make proper replace- 
ments for constants as (18) and (1) show we have, 


4, = An — 1) eB tm - VE t+n—-VE+n—-)) 
(20) * (a + B + 2n — 3)(a + B+ 2n — 2)%(a +B + Qn — 1)’ 


n > 2. 


From Theorem 1 and this value of \, we conclude that if a > —1,8 > —1, 
the sequence {J,(x, a, B)} is orthogonal in the restricted sense—a well-known 
result. From Theorem 2 we can similarly conclude that if neither a, 8, nor 
(a + B) equals —j, 7 a positive integer, the sequence {J,(z, a, 8)} is orthogonal 
in the general sense. 

(Ai) If in (18) we set a = 8 = O we obtain a differential equation which 
together with (2) for k = n leads to the Legendre Polynomials, (normalized in 
above sense), P,(z) = on = (2° — 1)". Setting a = 8 = 0 in (20) leads to 


2 
1, = on ae n > 2. Thus from Theorem 1 we conclude that the 


Legendre Polynomials are orthogonal in the restricted sense, a result well known. 
(B) Equation (19) together with (2) for k = n leads to a class of*polynomials 
(normalized in above sense), mentioned by Romanovsky.” 


nn 


R, (2, m, q, a) = Ze: (a’ + 2°)” exp ¢ tan™2) . G + 2°)” exp St tan? | 
a a} dx” a a 


where again 1/a,,, is given by (14). In (16) set k = n and make the proper 
replacements of constants and, 
_n-l (2m — n+ 1){4a°(m —n+ 1)? + ¢} 


"=F Qm—M+3m—n+1am—mM+H? "=* 
From Theorem 2 it now follows that the sequence {R,(zx, m, q, a)} is orthogonal 
in the general sense if m ¥ j/2, j a positive integer. There is no set of parameters 
m, q, @ which assures orthogonality in the restricted sense. 

In connection with Romanovsky’s note there appear to be several discrepan- 
cies. For the weight functions given there under types IV and V, the nth 
moments for sufficiently large n do not exist over the intervals there considered. 
Type V is the special case of type IV for a = 0. Type VI is none other than 
Jacobi Polynomials so that the orthogonality relations given there for this case 
are incorrect. In all three types listed certain of the recurrence relations for 
the polynomials are in error. 

(B:) We note here one special sub-class of R,. Take m = gq = Oanda = 1 
in (19). We obtain from (2) and (14) a system of normalized polynomials 
analogous to the Legendre Polynomials namely, ¢.(z) = an = 2 + 1)". 


8 V. Romanovsky, ‘‘Sur quelques classes nouvelles de polynomes orthogonaux,’’ Comptes 
Rendus, Vol. 188(1929), pp. 1023-1025. 













102 FRANK Ss. BEALE 


It is easy to verify for these that, 
[ ¢@on(a)dz=0, mxn, i= V=1. 


(C) Suppose that in (1), bo = 0, b; + 0. A linear transformation with real 
coefficients changes (1) into, oe a 8 , 
y dx x 


and (14) for k = n defines the generalized Laguerre Polynomials, (normalized 





This equation together with (2) 


in above sense), L,(z, a) = (—1)"x “e* =. [x"**e*]. Setting & = n and making 


proper replacements in (16) we get, Ax = (n — 1l)(a+tn-—1),n >2. From 
Theorem 1 we see that if a > — 1 the L, are orthogonal in the restricted sense, 
a well-known result. From Theorem 2 we can say that if a # —j,j a positive 
integer, the polynomials are orthogonal in the general sense. 

(D) If in (1), b1 = be = 0, bo ¥ O we can perform a linear transformation on 
x with real coefficients and get, 7 = hz. This differential equation together 


y dx 
with (2) and (14) givesa set of normalized polynomialsG,(x) = antes : eft 
Taking k = n and making proper substitutions for constants in (16) we get 
An = —(n — 1)/h,n > 2. If his negative it follows from Theorem I that the 
sequence {G,(x)} is orthogonal in the restricted sense. In fact, G, (x) = H,(x) = 
Hermite Polynomials. 

On the other hand, if h is positive we have from Theorem 2 orthogonality in 


the general sense. In fact, it can be easily verified for this case that, 


[ e**!? 4. (2) G(x) dz = 0, m # n, t =+/ i. 
(E) The only remaining possibility for (1) not so far discussed occurs when 
N = constant and D is linear. In this case it has been shown that P,(k, x) 
of (2) reduces to a constant.’ 
E. H. Hildebrandt has shown” that the polynomials P,(n, x) of (2) satisfy 
a differential equation of the form, 


d’y dy 
_ (by + biz + be2”) qe + lao + br + (ai + 2bs)a] 
— nai + (n + I)bely = 0, n= 1,2,3, ---. 


Moreover with the coefficients of d’y/dz’ and dy/dz in (21) he has shown that 
for (21) to have a polynomial solution of degree n the coefficient of y must be 
of the form given in (21). 


® Frank S. Beale, loc. cit. p. 209, Theorem I, . 
10 Loc. cit. pp. 404405. 








ORTHOGONAL POLYNOMIALS 103 


From (16) we can say that for k = n and an, orthogonal sequence P,(n, =), 
n = 0, 1, 2, --- we have, 


(22) a+ (n — 1)be z= 0, 
(23)  bolay + (2n — 2)be]” — [ao + (nm — 1)bi][arbi + (n — 1)bibe — acd] ¥ 0, 
where n is an integer > 2. Considering for (21) a solution of the type y = 
> cx’ we readily show that if (22) and (23) are satisfied, (21) possesses for 
i=0 

each n a single polynomial solution of degree n. Two solutions which differ 
merely by a constant factor are regarded as the same solution. This polynomial 
solution of (21) must be P,(n, =). 

By employing theorems from a previous paper by the author” we can show 
that if (22) and (23) are satisfied, the zeros of the polynomials of section II are 
simple whether these zeros are real or complex. 

Hahn has shown” that if a set of normalized polynomials and their deriva- 
tives satisfy a relation of the form (4) with \; ¥ 0 and if the zeros of the poly- 
nomials are all simple then the polynomials must necessarily satisfy an equation 
of form (21). Since in this paper we have considered all possible values of 
a;, (¢ = 0, 1), and b;, (¢ = 0, 1, 2), which lead to orthogonal polynomials, it 
follows that the only systems of polynomials with simple zeros and orthogonal 
in either restricted or general sense whose derivatives in turn are orthogonal in 
either sense are the systems of section 2. 


11 Loc. cit. pp. 207-209, Theorems I, to In. 
12 Loc. cit. pp. 634-636. 














THE SKEWNESS OF THE RESIDUALS IN LINEAR REGRESSION 
THEORY 


By P. S. Dwyer 
University of Michigan, Ann Arbor, Mich. 


In obtaining the regression of y on z it is customary to show the relation 
between the actual and the estimated y by computing the standard deviation 
of the residuals with the use of the formula o = o,+/1 — r®. If the errors 
are distributed normally one may estimate the number of values coming within 
one standard deviation, within two standard deviations, etc., of the regression 
line. However these errors are not always distributed normally, and in such 
a case it seems wiser to compute the skewness of the residuals and to use a 
Pearson Type III curve in making the interpretation. The present paper out- 
lines a technique for the calculation of a3:. which is feasible from a practical 
standpoint. It is based (a) on a cumulative totals method of obtaining the 
correlation coefficient which, at the same time, makes possible the determination 
of the third order moments needed to evaluate the skewness and (b) on an effi- 
cient ritual for computing the coefficient of skewness from the moments. 

The determination of the normality or non-normality of the residuals is not 
always immediately evident. If the scatter diagram or correlation chart is 
presented, one can make an estimate of the extent of normality but if not, and 
the most modern and efficient computational methods do not utilize the correla- 
tion chart, there is no way by which the presence or absence of normality can 
be detected. Some research workers are opposed to the use of the more efficient 
methods (particularly the use of the Hollerith tabulators) because the correla- 
tion chart is not presented. Though within limits it is possible to use the 
tabulator to present the correlation chart simultaneously with the values needed 
to compute the correlation coefficient [1], it is here suggested that the computa- 
tion of the skewness of residuals, which can now be accomplished quite easily 
from the tabulator runs, may be substituted for the examination of the correla- 
tion chart. j 

The classical least squares theory makes use of 


(1) e=y—lb — ber 





where bp and 6; are the solutions of the normal equations. We note that the 
first normal equation is Ze = 0 so that M, = 0 and the residual is a deviation. 
It follows that the skewness of residuals is 


L(y —b-— b,x)° 
2 <= ————. 
(2) as Ne 
104 





SKEWNESS OF RESIDUALS 105 


We wish to compute a3:. without computing the individual residuals. The 
denominator causes us little concern but it seems discouraging to evaluate such 
an expression as 


zy" — Nb; — bir=2’ — 3b Dy” - 3b, 22y" oe 3b5 Dy 
— 3bpb: da + 3bj Za’ y — 3bi boda” + bob: Day 


even though the values of by, b:, N, =x, Sy, Uz’, Dry, Ty’, Ua’, Ua’y, Vary’, 
Sy’ are available. 

A first simplification is made by summing (1) and dividing by N. We then 
have 
(3) M. = M, — bo = b,M,. 


and by subtracting (3) from (1) and denoting deviations by barred letters, 
we have 


(4) hes ¥ -_= bz 
so that the skewness of errors is 


_ 2g — 8b 2a9° + 3bidz'g — biEz 


(5) QAZ:€ No® 


This formula can also be expressed as 


(6) a fos — 3bi fixe + 3b; fia — bi 30 


[ice — bifiul*” 
A similar formula for the skewness of the residuals of z on y is 


Px 2. < 
(7) - 30 — 3b; fia + 3b, Miz — by /o3 
See ae ae 
[foo — bial” 


For theoretical purposes formula (6) may be put in standard units with 
b, = rv ,b; = rz , 30 = 3902 , fox = ano20, , etc. with the resulting 
Oz Cy 
(8) a a3 — 3a + 3r° ae = r* c30 
3:€ — (L — r?)3/2 . 


As r > 0, a3:< — @3:y Just as o — oy asr — 0. 

Formulas (6) and (7) are of some theoretical importance in that they show how 
the skewness of the residuals is connected with the skewness of the marginal 
distribution. Thus 


/ 
as fy, — 0, b; and b; — 0 and az3.¢ — a3:y 5 3:6" — A:2 5 
, . 
as by > o, 3-6 > — 3-2 and as by > o, Olg-¢! > — A3-y ’ 


. . , 
as b} > 1, a3:. > a3y-2. Similarly as b; — 1, a4" — a3,2-,. 





106 P. S. DWYER 


It is hence possible in some cases to get a good approximation to the skewness 
of the residuals if the regression coefficients and the skewness of the marginal 
distribution are known. 


TABLE I 
Correlation from first order cumulations 
(1) | (2) | (3) | 4)| G) | ©) | (7) | (8) | @) | GO) | G1) | G2) (3) | @4) 


3.49 |2.99 
3.00-|2.50- 


5 











2923} 2993) 12815 


























104} 454) 1096) 4399] 4399) 20245 


For actual computation, we use (6) and (7). It has been indicated previously 
how the values =z, Sy, Za’, Ezy, Sy’, Zx* and Sy’ could be obtained with the 
use of cumulations. An illustration used previously [2] is presented in Table I. 
The information was obtained from the Office of Educational Investigations of 
the University of Michigan and gives the University first semester average (X) 
and the high school average (Y) for 1,126 students entering the College of nd 
ture, Science, and the Arts in 1928. 

The new origin of each variable is taken at the class mark of the lowest class 
rather than at the class mark of a middle class as is conventional. In this way 
all negative terms are avoided in the computation of the moments. The z’s are 
arranged in descending order from left to right and the y’s in descending order 
from top to bottom. The notation z, is used to indicate the sum of all the z’s 





SKEWNESS OF RESIDUALS 107 


having the same value of y. Thus the first entry in column 13 is 5.8 + 2-7 + 
5-6+5-5+1-4 = 113. The column Cz, is obtained by cumulating the values 
of z,. Similarly y, is the sum of all the y’s having the same value and the first 
entry in column 14 is 18(6) = 108. The entries Cy, , Cy, , and Cz, are obtained 
similarly. 

The entries =z, Dy, Dz*, Dry, Dy’ are found in the lower right hand box in this 
position : 


Sy | Zry | Dy? 


Dx | rx’ | 


The values of =z and Zy are obtained from the final cumulations while the value 
of Zxy is obtained by adding the entries in the column above, or, as a check, 
the entries in the row to the left. The value of Zy’ is obtained by adding the 
entries in the row at the left of the box while the value 22’ is obtained by adding 
the entries above the box. 

The values of the third order sums are obtained by multiplying the entries 
above the box and to the left of the box successively by 1, 3, 5, 7,9, ete. Thus, 


Zx* = 4399 + 3(4339) + 5(4097) + etc. = 102,103, 
Da°y = 2923 + 3(2809) + 5(2578) + ete. = 63,121, 
Say’ = 4244 + 3(3714) + 5(2568) + ete. 46,047, 

Sy’ = 2993 + 3(2820) + 5(2160) + etc. = 38,633. 


In making the reductions we use ab — cd operations as much as possible. 
We first compute 


(9) 


Azy = N&ay — (2z)(Zy), 
(10) Azz = N=x’ — (2z)’, 
Awy = N3z’y — (22")(Zy). 
We note too that 
jiso = [NAzt.2 — (222)(Az,z)]/N*; fin = [NAzt,y — (222)(Azy)]/N” 
jie = [NAzy2 — (22y)(Azy)]/N°; fos = [NAy2.y — (22y)(Ay1)]/N” 


and finally we get a3:. or a3;. by (6) or (7). 

The general solution is outlined on the left of Table II. We record in Fig. A 
the values given by (9) and in the Fig. B the values resulting from the applica- 
tion of (10). The values 2Zy and 2=z are inserted in Fig. B to facilitate the 
calculation of Fig. C which gives the values of (11). The technique is very 
easily carried out once it is understood. It can be performed with hand calcu- 


(11) 








P. S. DWYER 


lators but it is ideally adapted to the use of the latest Marchant, Fridén, and 
Monroe models equipped with automatic positive and negative multiplication, 
so that ab-cd operations can be performed with a minimum of effort and a max- 
Actually the value of “a,’’ which is the total frequency, is 
the same for many of these operations so that there is further saving if a ma- 
chine is used which permits the locking in of a constant in such a way that it 
can be used, without continued key punching, in later ab-cd operations. 


imum of accuracy. 


TABLE II 
Abbreviated techniques for computing third order central moments, etc. 









































| 1126| 4399 20245) 102103 
| 2003 12815 63121 





10069 46047 

































38633 





1126 8798) 3444669 25910223 


| 


5986) 1263483) 10480961 












































2379645] 7555391| 


13364241] | 












1126| | 3444669|— 1131286764 
| 1263483 685438652 
2379645| 944161028 

















Fig. D. 







803580396 








o 








- 


























(—3b1), (—bi3) 





The values in Fig. D are obtained by dividing the values A,,,, Az,, and 
Az,z in Fig. C by N’ and the values in the diagonal below, NA,:,, — (22y)Ayw , 


etc., by N*. The values b; = = and b; = — can be inserted in Fig. D adjacent 














| 1126]  (.367)| 2.717 | — 7925 
(—b3), (—30/) | (581) 997 |  .4801/ (—1.593) 

| 1.877 6614} (.846) 

| 5629] (—.150)| ~a 



































_ Bu 





Ko2 





to the N. The value of the correlation coefficient is r = +/b,b, = —ML— 








V iisoiion 


SKEWNESS OF RESIDUALS 109 


We have too, « = Vj — bifn and o = / iw — bin so that the standard 
deviation of residuals is readily computed from the entries of Fig. D. The 
numerator of (6) is readily obtained after entering —3b,, 3b], (—b}) in the 
diagonal under the diagonal containing the third moments and multiplying by 
columns. The numerator of (7) is obtained by entering —b}* , 3b}? , —3b;, 
in the same diagonal and multiplying by rows. The theory is applied to the 
results of Table I and the details are presented at the right of Table II. It is 
to be noted that all values indicated here are the coded values x, y and not the 
original values, X, Y. However, the correlation coefficient and the skewness 
of errors are independent of any such change in unit, grouping errors being 
neglected. 

From Fig. D we see that b; = .997/2.717 = .367, that b} = .997/1.877 = 
531 and that r = +/(.367)(.531) = .441. In this case we wish to estimate 
college record, z, from high school record, y, so we use b} = .531 and compute 
—3b; = —1.593, 3b)’ = .846, —b;> = —.150. It follows that 


_ —.7925 + (.4801)(—1.593) + (.6614)(.846) + (.5629)(—.150) _ 
= 3717 581(997)F® SSS 


3:6! — 334. 
It thus appears that a better picture of the variation of the residuals in this 
case is obtained with the use of a Pearson Type III with a3; approximately —4 
than is obtained with the use of a normal curve. It is not necessary, of course, 
to form Fig. D as the results can all be obtained from Fig. C. Thus if we 
multiply the numerator and denominator of (6) by N’, we get entries, with the 


exception of the b’s, which are in Fig. C. Now in this case b} = Aw and b; = 


yo 


Aw so that these values can be inserted in the upper left as before. Also the 


vey 
powers of b; can be inserted in the lower right as in Fig. D. We have then 


—1131,286,764 + (685,438,652)(—1.593) + (944,161,028) (.846) 
+ (803,580,396) (—.150) 
[3444669 — (1263483)(.531)/°” 


We know however, since the grades were coded, that it is not sensible to carry 
results to more than three places, (and, indeed, a three place determination of 
the skewness is very satisfactory for interpretive purposes even though more 
places might be obtained) so we cut down the number of places. The division 
of numerator and denominator by 10°, and the dropping of the decimals results in 


_ —1131 + 685(—1.593) + 944(.846) + 804(—.150) _ 
7 (344 — 126(.531))9/2 





3: 





— 335. 


It is possible of course to duplicate the theory indicated in Table II with the 
use of moments rather than the A’s. In this case Fig. A consists of 1, 22/N, 





110 P. 8S. DWYER 


zay _ (22) (2y) _ 


=2"/N, etc. We have such formulas as az,y = N WN oN Mu — H10H01 » 


where zy = a At, = “et , 

It would be possible to compute the a,:, in a somewhat similar fashion though 
it would take somewhat longer. In the first place we would have to compute 
>2’y’ from the correlation table. This could be done by forming the cumula- 
tion C(y?2) and multiplying by 1, 3, 5, 7, 9, ete. When this is done, however, 
it does not appear that the calculation of the central moments of the fourth order 
can be reduced to as simple a ritual as the calculation of the third order moments. 

The question should be raised as to the calculation of the skewness when 
there are two or more independent variables. This can be done, of course, but 
the calculations are lengthy. The point of the present paper is to provide an 
easy and simple technique for computing the skewness of residuals in the case 
of two variable linear regression. 


ete. 


REFERENCES 


[1] Paut S. Dwyer anp Autan D. MeacuaM, “The preparation of correlation tables on a 
tabulator equipped with digit selection,’’ Jour. Am. Stat. Ags., Vol. 32 (1937), 
pp. 654-62. See particularly page 657. 

[2] Paut S. Dwyer, ‘‘The computation of moments with the use of cumulative totals,” 
Annals of Math. Stat., Vol. 9 (1938), pp. 288-304. See particularly pages 299-303. 





NOTES 


This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


(a a I ne 


NOTE ON THE ADJUSTMENT OF OBSERVATIONS 
By ArtTHuR J. KAVANAGH 


The Forman Schools, Litchfield, Conn. 


The method of least squares has been extended to the adjustment of observa- 
tions with errors in more than one variable. The history of the development 
and its principal results have been given by Deming [2], [3], [4], [5]. The basis 
is the assumption that for the “best”? adjustment the sum of the weighted 
squares of all the residuals (observed values minus adjusted values) must be 
made a minimum with respect to the adjustments to the observations and with 
respect to the parameters involved in the conditions the adjusted values must 
satisfy. In certain problems, such as some arising in the study of relative 
growth in biology, this assumption is not adequate; it is necessary that the 
sum to be minimized be generalized to include cross products as well as squares 
of the residuals. 

Suppose we have a set of n universes of g-dimensional points whose centers of 
gravity are known to satisfy certain conditions; for instance, they might all lie 
on a certain type of curve. A sample having been taken from each universe, 
the center of gravity of each sample is taken as the observed center of gravity 
of the corresponding universe, and it is desired to determine the most probable 
set of adjustments to the coordinates and the most probable set of parameters 
involved in the conditions, subject to the requirement that the adjusted values 
satisfy the conditions exactly. It is assumed that the sampling distribution of 
the center of gravity in each universe satisfies the multivariate normal law, and 
that the standard deviations and coefficients of correlation of each sample may 
with sufficient accuracy be taken as the constants of the corresponding universe. 
Then by reasoning analogous to that of the derivation of the least squares 
principle for one variable from the univariate normal law, the probability of 
getting the observed set of values is proportional to e °, where 


(1) Q= a Q: 
Q; being a homogeneous quadratic function of the errors at the ith centroid and 


in general involving the cross products as well as the squares of the errors. 
111 











112 ARTHUR J. KAVANAGH 





The probability will be a-maximum when Q is a minimum. Consequently the 
best estimates for the coordinates of the centroids will be those making Q a 
minimum, subject to the cond ‘ions which the coordinates must satisfy. 

For example it may be desired to study the relation between height and weight 
among growing boys by fitting a curve to the points whose abscissa and ordinate 
are respectively average height and average weight of a particular age group, 
one point corresponding to each age group in the study. The data for such a 
study are obtained from samples of the several age groups. Then the number 
nm of universes is the number of age groups being studied, each universe con- 
sisting of the totality of two-dimensional points obtained by pairing the height 
with the weight of each boy in the age group. The centroid or “average point” 
of each universe would ideally be obtained from measurements of all the in- 
dividuals of that age, but since sampling must be resorted to it is necessary to 
make allowances for the sampling distributions of the centroids. It is known 
that within each age group there is correlation between height and weight [1]. 
Consequently the sampling distribution of each centroid will exhibit a correla- 
tion which can be expressed in terms of the coefficient of correlation between 
height and weight of the individuals of the universe from which the sampling 
distribution arises. The existence of this correlation results in the presence of 
the cross-product term in the exponent of the bivariate normal formula de- 
scribing the sampling distribution of the average values, that is in the Q; of each 
centroid. If there were no such correlations the cross-product term in each Q; 
would vanish and the situation would reduce to that of least squares. 

In the general case, let Xi; , Xe:, --- , X qi be the observed coordinates of 
the 7th centroid, 2;°, 2: , --+ , gi: the adjusted values (to be determined), and 
Vi = Xj — 2;;. Then Q; may be written 


Qs = wauVis + wiesViiVes + 0s + W1qiVi1iV gi 


. + wyiVoiVis + WeiV9i +--+ WeqiV2iV gi 
(2) at capaeseniiann a ehaueaaeinlinaaainases 
+ WaiV giVii + WpeiV giV2i foes + WaqiV ai 


the w’s being the weights, with w;,; = w.;;. Thus in the case of two variables, 
if N; be the number of items in the 7th sample, 7; its coefficient of correlation, 
and 01; , 2; its standard deviations, then 


N; ‘—Nya; N; 


en eer Wi2i = Wai et eee 
‘ 2(1 — rio’ ” 2(1 — ri)o3 





2(1 — rior 02: 

The coefficients of the cross products in Q involve the coefficients of correla- 
tion of distributions. If the latter are all zero the cross products vanish and Q 
reduces to the sum of weighted squares, which is the basic expression of the 
least squares procedure. Consequently, from this point of view, the least squares 
assumption is equivalent to the assumption of zero correlation between the 


errors. The procedure in the more general situation might be called “least 
quadratics’’. 








ADJUSTMENT OF OBSERVATIONS 113 


The Lagrange method of undetermined multipliers can be used to calculate 
the values of the adjustments to the coordinates and the values of the param- 
eters. The procedure is the same as for least squares [2], [3], [5], the only 
difference being the somewhat greater complication of the algebra. We shall 
summarize the development here. 

The condition ecuations, supposed v in number, may be written 


F* (xy, , +++ 5 Zen 3 Pry P2y*** 5 Dr) = 0, h= 1,2, ---,», 


‘ h ° ° 
where each F” may in general involve any or all of the numbers 2z;; as well as 
any or all of the parameters p, , whose number we suppose to be r._ Let 


(3) F’; = OF" /dz 5: , Fi = oF" /dp 


where the X’s have been substituted for the x’s after differentiation, and each 
pi has been replaced by the best available approximate value po. Let Fo be 
the value of F” after the same substitution. Also let 1, = po — yx. Then if 
the V’s and v’s are small the conditions may be written 


(4) LL FV + DL Fin = Fo, 


Differentiate Q with respect to the V’s and equate the result to zero, eliminat- 
ing the factor 2. Differentiate (4) with respect to the V’s and the v’s, multiply 
each equation by the corresponding undetermined multiplier —\, , and sum 
the results together with the result from differentiating Q. Collecting coeffi- 
cients of the differentials 5V;; and 6m, equating to zero and transposing the 
terms involving \, , we get 


WusVis + WiiVas + +++ HF WigiVos = [nF til 
(5) waiVis + WeeiVos + +++ + WagiVgs = [aF 2s] 


WaiVii + Wei Voi tee + WaaiV gi = [nF] 
(6) Da Fil = 0 


where the brackets denote summation with respect to h. 

Equations (5) can be written down easily, since the coefficients w,,; appear 
in the same order as in (2). The equations corresponding to each 7 form a 
complete set which can be solved independently of those for other values of 7. 
The solution can be expressed 


Vis = Ar DAn Pre) + Ao laPad +--+ + Agi Dn Foi 
q 
= [s du AniF | 


where Ax;; is (—1)**’ times the minor corresponding to wz;;, divided by the 
principal determinant. By symmetry Aj: = A ji. 













114 ARTHUR J. KAVANAGH 


The V’s in (4) are to be replaced by their values from (7) and the coefficients 
of the d’s collected. To facilitate this let 


Lie = Do Lis 


t=1 


q qg 
Lins = Dy Qo Avi PFs. 
s=1 r= 
Each Lj; can be written down easily from the corresponding Q; as written in 
(2): in each term w,.:V+:V.; replace W;si by Arsi, Vri by Fi;, and V,; by Fi;. 
It is important to preserve the order of the subscripts of the V’s in (2), and to 
treat the diagonal terms w,,:V;; as though written w,,:V.iVri. It is seen that 
Lie = Lnji, and Lj, = Li;. Then the substitution from (7) into (4) gives 


~ 


(8) De Lindi + DL Fin = Fo h=1,2,---,». 


7=1 l=1 























Equations (8), with (6), are formally identical with those of the least squares 
procedure which are called by Deming the “general normal equations’’, and 
they can be written schematically in the same manner. The further procedure 
is identical with that for least squares, involving solution of the general normal 
equations for the )’s and v’s, substitution of the values of the )’s into (7) to 
obtain the V’s, and then adjustment of the observations by use of the V’s, 
and adjustment of the provisional values of the parameters by use of the v’s. 
A word of appreciation is due Dr. O. W. Richards of The Spencer Lens Com- 
pany for calling this problem to my attention, and for encouragement in the 
carrying out of the solution. 


REFERENCES 


[1] JosepH Berxson, ‘‘Growth changes in physical correlation—height, weight, and 
chest-circumference, males,’’ Human Biology, Vol. 1(1929), pp. 462-502. 

[2] W. Epwarps Deming, ‘‘The application of least squares,’ Phil. Mag. 7th Ser., Vol. 
11(1931), pp. 146-158. ; 

[3] W. Epwarps Demine, ‘‘On the application of least squares,’’ Phil. Mag. 7th Ser., 
Vol. 17(1934), pp. 804-829. 

[4] W. Epwarps Demina, Proc. Phys. Soc. Lond., Vol. 47(1935), pp. 92-106. 

[5] W. Epwarps Demine, Some Notes on Least Squares, Graduate School, Dept. of Agric., 

Washington. 








of 


-- S&S DM 


a ee ee a ee. | 























ESTIMATION OF A QUOTIENT 115 


rrr ree 


aata 


THE ESTIMATION OF A QUOTIENT WHEN THE DENOMINATOR 
IS NORMALLY DISTRIBUTED 


By Roxpert D. Gorpon 


Scripps Institution of Oceanography, La Jolla, Calif. 


1. Introduction. In an oceanographic investigation we have to deal with a 
time series consisting of single pairs of observed values xz, y, of two independent 
stochastic variables, whose true (mean) values we shall denote respectively by 
a, b. Of interest is the corresponding time series of quotients (b/a), which it 
is required to estimate from the observations xz, y. Both x and y are approxi- 
mately normally distributed about their mean values a, b with rather large 
variances o;, o, which can be estimated. It is easily possible for z to vanish 
or even ta be of opposite sign to a, although a cannot itself vanish. The re- 
quired estimates of (b/a) should have the property that they can be numerically 
integrated, i.e. that an arbitrary sum of such estimates shall equal the corre- 
sponding estimate of the true sum. 

Let us define a function y(x) to have the property that its mathematical ex- 
pectation E{y(x)} is exactly 1/a, where a = E(x). If such a function exists 
we shall have 


(1) Ely-y(z)} = E(y)-E{y(x)} = b-(1/a) = b/a 


so that y-y(x) will be an estimate of b/a which has the required property: 
namely such estimates can be added, and we have 


E{yry(a1) + yry(a2)} = Efyry(ai)} + Efyszy(x2)} = bi/ai + b2/ae 


as required. It turns out that if x is normally distributed with non-zero mean 
such a function y(z) does exist, and is given by the formula 


= ls leon cane om fn 
























(2) ipa 1 exp (2*/20%) [ een dt = > Rate 


where R, is the “ratio of the area to the bounding ordinate” which is tabulated 
by J. P. Mills,’ also in Pearson’s tables.” Equation (2) holds if a is positive; if 
a is negative the integration should extend over (%/oz, — ©). It is easy to 
verify that 


(3) E(y(z)) = oo [© ex (- ci ae ) ae =: 





by direct substitution from (2). 





1 J. P. Mills, ‘““Table of ratio: area to bounding ordinate, for any portion of the normal 
curve,’’ Biometrika, Vol. 18 (1926), pp. 395-400. 

2 Karl Pearson, Tables for Statisticians and Biometricians, part II, table III, Cambridge 
Univ. Press. 


116 * ROBERT D. GORDON 


2. The law of large numbers for y(x)._ The function y(x) defined by (2) has 
mean value 1/a as required, but its second moment (hence variance) does not 
exist, as may readily be verified. By a theorem of Khinchine’ however, its 
values satisfy a law of large numbers. It will be of interest to inquire about the 
“strength” of this law of large numbers for y(z). Namely, given a positive 
number e, how many “observations” (independent estimates) y(x) will suffice 
to guarantee probabilities of .50, .90, .95, etc. for the following inequality to 
hold 


(4) (a1) + (ae) + +++ + (en) _ 


n 


1 
—-|<e 
a 


where n is the number of “observations.” 

In order to arrive at a rough answer to this question we have made use of 
certain inequalities due to Tshebysheff (Tshebysheff’s “method of moments’’, 
ef. Uspensky*). Let wu be an arbitrary stochastic variable whose distribution 
has moments of the first and second order which are known. Denote by m its 
first moment, by o’ its variance, then it results from Tshebysheff’s theory that 
the probability P(u; , ue) for a value of u to lie between wu; and uz (i.e. uw Su S 
U2) satisfies the inequality 


2 


2 
o o 
(5) PRE 8 ae tare 


This inequality is independent of the values, or even the existence, of further 
moments of the u-distribution beyond the second, and depends only on the 
condition that the cumulant of the distribution function shall have at least three 
“points of increase.” 

Although y(z) does not have a second moment, a second moment does exist 
for those values of y(z) which correspond to x = — 6 > — o, where @ is an 
arbitrary number, positive or negative. If we can estimate the first two mo- 
ments of y(x) ~ 1/z corresponding to a given value of 6, then for a given number 
n of observations we need only to divide the corresponding variance by n to 
obtain o” in (5), then multiply (5) by the nth power of the (normal) probability 
for the inequality = —9, in order to obtain a lower bound for the probability 
of the inequality (4). 61s to be determined so as to yield a maximum result. 

The first moment m of y(z) for values of x 2 —@ is easily computed, and is 
given by the formula 


(6) sit « {1 > ote}. 


R_(6+0)/o2 


3 J. V. Uspensky, Introduction to Mathematical Probability, pp. 195, McGraw-Hill (1937). 
4J. V. Uspensky, |.c. pp. 365 ff. 





ESTIMATION OF A QUOTIENT 


The second moment is harder to compute, but if we place 


$(0) = K-(m, — mi) = ~ =e [, b@r exp (- - “ *) dz 


z 


- i 7) exp (- oe ae 
V/2r oz [ exp (- ( =) dx 


1 [ ( (x — *) 1 [ —12/2 
= — ex - a= —— dt 
V/ 2 oz 4-6 , 20; / Qr Y—(6+a)/oz : 


we easily obtain the relationship 


’ 1 (6 + *) { Oz ( R_oje eee WV 
8 6) = —— exp(- Rive) — —\1 — ; 
oo V 2r 03 x ( 20; ea H—erayies 


| 


Pr) 7) 
Ww Ww 
-* - 
al a 
oO: a 
< < 
o © 
°. ° 
« « 
a a 


LOWER BOUNDS OF “ LOWER BOUNDS OF 
PROBABILITIES PROBABILITIES 
THAT MEANS J THAT MEANS 

ARE WITHIN 10% ARE WITHIN 25% 
OF TRUE VALUE OF TRUE VALUE 


100 200 $00 1000 : 10 25 50 100 
NUMBERS OF OBSERVATIONS NUMBERS OF OBSERVATIONS 
(LOGARITHMIC SCALE) (LOGARITHMIC SCALE) 


Fie. 1 Fic. 2 


From (7), using a table of the probability integral, it can be verified that 
¢(—a — 3c.) < 0.001. Assume, therefore, as a boundary condition ¢(—a — 
3cz) = 0 then (8) can be integrated graphically or numerically. It is by this 
means that the curves shown in Figs. 1 and 2 were determined. Computations 
were also attempted for a/c, = 3, a/oz = 1, but it was not possible to obtain 
significant results: it would be necessary in these cases to take more than two 
moments into account, which would lead to hopeless complications. In these 
figures the ordinates represent probabilities for an observation to fall between 
90a and 1.11a (Figure 1), and between .75a and 1.33a (Figure 2), respectively. 

















118 A, WALD AND J. WOLFOWITZ 






3. Two practical formulas for computations. It seems worthwhile to note 
here two simple formulas in connection with Mills’ ratio (2) which will be useful 
for computations. The first is the obvious relationship 


(9) Ri, = V2ne"” — R, = 1/z — R, 


in the notation of Pearson’s tables. The second applies to large values of z, 
and may be written 


z 1 1 
10 7: = —— Bale = 
(10) Fra <7) 5, Beles <5 
(10) is true for x > 0, and can be proved by means of the differential equation 
which (x) satisfies. 


4. Remarks. The estimate (zx) has the following inadequacy: If only a single 
observation z is known, then it is unknown whether a is of like or unlike sign 
compared to zx. It turns out then that the mathematical expectation for the 
value of y(x) vanishes identically. This difficulty of course disappears if more 
than one observation is available. Methods of avoiding this difficulty for time 
series, e.g. by noting relative frequencies for observations separated by 1, 2, 3 
etc. intervals to agree in sign, will be discussed elsewhere in connection with 
practical applications. 

It may be worthwhile to note that Geary’ developed certain characteristics 
of the distribution of a quotient, which however are not adapted to our purposes. 


NOTE ON CONFIDENCE LIMITS FOR CONTINUOUS DISTRIBUTION 
FUNCTIONS 


By A, WaLp* AND J. WoLFowITz 


In a recent paper [1] we discussed the following problem: Let X be a stochastic 
variable with the cumulative distribution function f(x), about which nothing is 
known except that it is continuous. Let 2, --- , x, be n independent, random 
observations on X. The question is to give confidence limits for f(z). We 
gave a theoretical solution when the confidence set is a particularly simple and 
important one, a “‘belt.”’ 

A particularly simple and expedient way from the practical point of view is 
to construct these belts of uniform thickness ({1], p. 115, equation 50). If the 
appropriate tables, as mentioned in our paper, were available, the construction 
of confidence limits, no matter how large the size of the sample, would be im- 
mediate. 

Our formulas (11), (16), (19), (27) and (29) are not very practical for computa- 
tion, particularly when the samples are large. We have recently learned that 





5 Geary, R. C., ‘‘The Frequency Distribution of a Quotient,’’ Jour. Roy. Stat. Soc., 
Vol. 93 (1930), pp. 442-446. 


* Columbia University, New York City. 














— i. paar wee 


CONTINUOUS DISTRIBUTION FUNCTIONS 119 


there exists a result by Kolmogoroff [2], generalized by Smirnoff [3],' which for 
large samples gives an easy method for constructing tables, i.e. of finding a 
when c and n are given (all notations as in [1]). The result of Kolmogoroff- 
Smirnoff is: 

Let c = \/+/n. Then for any fixed \ > 0, 


lim P = lmP =1-—e™ 


n==00 n= 


lim P =1-—2)> (-1)"”e™ 
n=00 m=1 


This series converges very rapidly. 


REFERENCES 


[1] Watp AND Wo.FowilTz, ‘‘Confidence limits for continuous distribution functions,’’ 
Annals of Math. Stat., Vol. 10(1939), pp. 105-118. 

[2] A. Kotmocororr, ‘‘Sulla determinazione empirica di una leggi di distribuzione,’’ 
Giornale dell’ Instituto Italiana degli Attuari, Vol. 11(1933). 

[3] N. Smrrnorr, ‘‘Sur les ecarts de la courbe de distribution empirique,’’ Recueil Mathe- 
matique (Mathematicheski Sbornik), New series, Vol. 6(48)(1939), pp. 3-26. 


1In the French résumé of Smirnoff’s article, on page 26, due to a typographical error 
this formula is given with a factor (—1)™ instead of the correct factor (—1)"~!. The 
correct result follows from equation (112), page 23, of the Russian text when ¢ is set equal 
to zero. 
















REPORT OF THE CHICAGO MEETING OF THE INSTITUTE 


The Sixth Annual Meeting of the Institute of Mathematical Statistics was 
held at the Stevens Hotel, Chicago, Thursday to Saturday, December 26 to 28, 
1940 in conjunction with the meetings of the American Statistical Association, 
the Econometric Society, and the American Marketing Association. The fol- 
lowing fifty members of the Institute attended the meeting: 


H. E. Arnold, C. S. Barrett, A. G. Brooks, R. W. Burgess, A. G. Clark, A. C. Cohen, Jr., 
W. G. Cochran, A. T. Craig, C. C. Craig, B. B. Day, W. E. Deming, J. L. Doob, P. 8. Dwyer, 
Churchill Eisenhart, J. W. Fertig, P. G. Fox, Hilda Geiringer, E. J. Gumbel, Myron Heid- 
ingsfield, Harold Hotelling, Leo Katz, J. F. Kenney, L. F. Knudsen, Alma Kohl, T. Koop- 
mans, D. H. Leavens, Ida Levin, G. A. Lundberg, S. N. Lyttle, W. G. Madow, Ralph 
Mansfield, G. F. T. Mayer, J. R. Miner, E. C. Molina, C. R. Mummery, J. I. Northam, 
E. G. Olds, P. S. Olmstead, A. L. O’Toole, J. A. Pierce, Wilhelm Reitz, P. R. Rider, M. M. 
Sandomire, L. W. Shaw, W. A. Shewhart, F. F. Stephan, S. A. Stouffer, A. G. Swanson, 
S. S. Wilks, M. O. Woodbury. 


The opening session, on Thursday afternoon, was devoted to contributed 
papers in probability and statistical methodology. The Chairman was Professor 
S. S. Wilks of Princeton University, and the following papers were presented: 


1. On the Calculation of the Probability Integral on Non-Central t and an Application. 
C. C. Craig, University of Michigan. 
2. Effective Methods of Graduation. 
Max Sasuly, Office of the Actuary, Social Security Board. 
3. On Some New Results in the Sampling of Discrete Random Variables. 
William G. Madow, Bureau of the Census. 
4. On the Use of Inverse Probability in Sample Inspection. 
W. Edwards Deming and W. G. Madow, Bureau of the Census. 
5. On a Convergent Iterative Procedure for, Adjusting a Sample Frequency Table when 
Some of the Marginal Totals are Known. 
F. F. Stephan, Cornell University, and W. Edwards Deming, Bureau of the Census. 
6. The Return Period of Flood Flows. 
E. J. Gumbel, New School for Social Research, New York City. 
7. A Note on the Power of a Sign Test. 
W. M. Stewart, University of Wisconsin. 
8. A New Explanation of Non-Normal Dispersion. 
Hilda Geiringer, Bryn Mawr College. 


Abstracts of these papers follow this report. 


On Friday morning a session was held jointly with the American Marketing 
Association on The Theory and Application of Representative Sampling. Under . 
the chairmanship of Professor Theodore H. Brown of Harvard University, the 
following papers were presented: 


1. Background and Method. 
F. F. Stephan, Corneil University. 


120 










REPORT OF CHICAGO MEETING 






2. Application to Marketing Problems. 
Archibald M. Crossley, New York City. 

3. Application to Agricultural Problems. 

Arnold J. King, Iowa State College. 








The afternoon session on Friday was held jointly with the American Statis- 
tical Association and Econometric Society on The Analysis of Variance. The 
chair was held by Professor P. R. Rider of Washington University and the fol- 
lowing papers were presented: 







1. The Relation Between the Design of an Experiment and the Analysis of Variance. 
A. E. Brandt, Soil Conservation Service. 
2. The Underlying Principles of the Analysis of Variance and Associated Tests of 
Significance. 
Churchill Eisenhart, University of Wisconsin. 
3. The Applications of the Analysis of Variance to Non-Orthogonal Data. 
W. G. Cochran, Iowa State College. 
Discussion: 
Gertrude M. Cox, North Carolina State College. 
John F. Kenney, University of Wisconsin Extension Division. 
W. Edwards Deming, Bureau of the Census. 















On Saturday morning and afternoon, sessions were held with the American 
Statistical Association on Collection and Use of Statistics for Quality Control in 
National Defense Industries. At the morning session the following papers were 
given, with Dr. C. W. Gates of the Western Electric Company in the chair: 







1. Report on the Quality Control Program of the American Standards Association. 
John Gaillard, Western Electric Company. 

2. Sample Verification in the Administration of the Population Census. 
W. Edwards Deming, Bureau of the Census. 

3. The Importance of the Statistical Viewpoint in High Production Manufacturing. 
P. L. Alger, General Electric Company. (Read by C. Eisenhart.) 

4. On the Initiation of Statistical Methods for Quality Control in Industry. 

Leslie E. Simon, Aberdeen Proving Ground. 












At the afternoon session the following papers were presented under the chair- 
manship of Dr. John Johnston of the United States Steel Corporation: 






1. The Place of Statistical Analysis in Ferrous Metallurgy. 
E. M. Schrock, Jones and Laughlin Steel Corporation. 
2. Statistical Methods in the Production and Inspection of Cast Iron Pipe. 
J. T. MacKenzie, American Cast Iron Pipe Company. 
3. Applications of Statistical Methods to Metallurgy. 
R. B. Mears, Aluminum Company of America. f 
Discussion: 
Churchill Eisenhart, University of Wisconsin. 












The annual business meeting of the Institute was held on Thursday afternoon 
after the session on probability and statistical methodology, with the President 
presiding. 

The Secretary-Treasurer read the financial report for 1940. 






122 REPORT OF CHICAGO MEETING 


The Editor of the Annals of Mathematical Statistics reported on the progress 
of the Annals during 1940. It was stated that manuscripts worthy of publica- 
tion were now being submitted at a rate that would justify the publication of 
a 500-page annual volume. To make this amount of publication self-supporting 
upon the expiration of the Rockefeller grant in June, 1941, it was pointed out 
that another 150 new subscriptions would have to be obtained during 1941. 
Judging from the rate at which subscriptions had been coming in during the 
past two years such an increase was considered entirely feasible with the coopera- 
tion of the members of the Institute. Various methods of effecting this increase 
were discussed at the meeting and suggested for the consideration of the Board 
of Directors. 

On behalf of the Board of Directors the President made the following report: 

1. The Report of the War Preparedness Committee, approved in preliminary 
form at the Hanover meeting, had been preprinted and some of the preprints 
had already been distributed. 

2. Arrangements had been made with the Executive Officer of the National 
Roster of Scientific and Specialized Personnel to send the statistics check list 
to all members of the Institute who are not members of the American Statistical 
Association. 

3. That preprints of the pamphlet on The Teaching of Statistics, including an 
address by Professor Harold Hotelling, discussion by Dr. W. E. Deming and 
the resolutions on the teaching of statistics adopted by the Institute at the 
Dartmouth meeting had been produced and distributed. 

4. That application! had been made to the Executive Committee of the Ameri- 
can Association for the Advancement of Science through the Permanent Secre- 
tary for admission to the status of an affiliated society in the A sociation. 

It was announced that through the annual election, carried out by mail ballot, 
the following officers were elected for 1941 (all names being those proposed by 
the Nominating Committee): 

President: Professor Harold Hotelling 

Vice-Presidents: Professor A. T. Craig 

Professor H. C. Carver 

Secretary-Treasurer: Professor E. G. Olds 

The annual luncheon was held at noon on Friday with the President-Elect 
presiding. Short talks were made by Dr. E. J. Gumbel, Dr. T. Koopmans and 
Professor S. S. Wilks, while the annual luncheon address was delivered by 
Professor P. R. Rider. 

7 P. R. Riwer, 
Secretary-Treasurer. 


1 This application was approved by the Executive Committee of the A.A.A.S. at its 
December 1940 meeting. 





ABSTRACTS OF PAPERS 


(Presented on December 26, 1940, at the Chicago meeting of the Institute) 


On the Calculation of the Probability Integral on Non-Central ¢ and an Appli- 
cation. C.C. Craic, University of Michigan. 


It seems not to have been noted that the probability integral for non-central ¢ can be 
calculated by means of an infinite series in incomplete 6-functions which converges rapidly 
forsmall samples. The application here considered is to a test based on the randomization 
principle which is the subject of E. J. G. Pitman’s paper: Significance tests which may be 
applied to samples from any populations (Roy. Stat. Soc. Jour., Vol. 4 (1937), pp. 119-130). 
In case the samples come from normal populations with equal variance but with unequal 
means, the chance that. the hypothesis of equal population means will be accepted on this 
test is given by this probability integral which is evaluated in some illustrative numerical 
examples. 


On Some New Results in the Sampling of Discrete Random Variables. Wi.- 
L1AM G. Mapow, Bureau of the Census. 


Many statistical tables may be regarded as the result of subsampling finite populations 
classified into r X s X --- tables. The main aim of this paper is to derive the associated 
statistical theory including both the finite and limiting distributions. After evaluating 
the fundamental distributions and the moments it is shown that under certain conditions, 
the limiting distribution is multinomial, while under other conditions the limiting distribu- 
tion is multivariate normal. These results are then applied to determine the adequate size 
of sample, and the sampling proportions from various strata. 


On the Use of Inverse Probability in Sample Inspection. W. Epwarps DEm- 
ING and Witi1amM G. Mapow, Bureau of the Census. 


The theory of inspection by sampling is abstractly equivalent to one part of the theory 
of subsampling. The theory of subsampling finite populations is considered in this paper 
in order to investigate the differences that occur when the methods of fiducial inference and 
inverse probability are used, particularly in regard to determining the adequate size of 
sample. In sample inspection, the prior distribution of failures is almost always known, 
at least approximately. In using any system of sample inspection, a number of failures will 
pass undetected. On the basis of certain prior distributions of failures, distributions are 
derived for the number and percent of failures remaining after each of several different 
possible systems of sample inspection has been applied. Formulas giving the cost of partial 
inspection are used together with these distributions in order to determine methods of 
sample inspection having various desired properties. 


On a Convergent Iterative Procedure for Adjusting a Sample Frequency Table 
When Some of the Marginal Totals are Known. Freprrick F. STEepuHan, 
Cornell University and W. Epwarps Dremine, Bureau of the Census. 
The 5 per cent sample taken with the 1940 Population Census presents an interesting 
problem of estimation in which the estimates are connected by equations of condition. 
123 - 





124 ABSTRACTS OF PAPERS 


These equations arise from the fact that certain sums of estimates derived from the sample 
should equal the corresponding frequencies derived from the tabulations of the census 
enumeration, i.e. the distribution of each of several variables may be known but their 
joint distribution may only be estimated from a cross tabulation of the data furnished by 
the sample. The adjustment of the sample estimates is accomplished by the principle of 
least squares and an outline of the various types of conditions for two and three variables 
is presented. The solution of the normal and condition equations is tedious when hundreds 
of sets of estimates must be adjusted but a simple iterative procedure is available (see 
Annals of Math. Stat., Vol. 11 (1940), pp. 427-444). 


The Return Period of Flood Flows. E. J. Gumser., New School for Sr ial 
Research (N. Y.) 


For any statistical variable the return period is defined as the mean number of trials 
necessary in order that a certain value of the variable or a greater one returns. The return 
period is a theoretical statistical function such as the distribution or the probability. In 
hydraulics the corresponding observed values are the recurrence and exceedance intervals. 

The main thesis is that the flood flows are the largest values of flows which have to be con- 
sidered as unlimited variables. The method of return periods applied to the largest values 
leads without further assumptions to a formula which gives the return period f(z) of a flood 
superior to z, and at the same time the most probable flood to be reached not at a certain 
time, but within a certain period. This formula contains only two constants, which are 
linear functions of the mean annual flood and the standard deviation. Fuller’s formula 
turns out to be an asymptotic expression of my formula. 

This method applied to the Connecticut, Columbia, Merrimack, Cumberland, Tennessee 
and Mississippi rivers shows a very good fit between theory and observation, superior to 
the methods applied heretofore. 


A Note on the Power of the Sign Test. W. M. Stewart, University of Wis- 
consin. ‘ 


Let us consider a set of N non-zero differences, of which z are positive and N — z are 
negative; and suppose that the hypothesis tested, Ho, implies in independent sampling 
that xz will be distributed about an expected value of N/2 in accordance with the binomial 
(4 + 4)%. Asa quick test of Hy , we may choose to test the hypothesis ho that z has the 
above probability distribution. Defining r to be the smaller of z and N — z, the test con- 
sists in rejecting ho and therefore Hy whenever r < r(e, N), where r(e, N) is determined by 
N and the significance level «. 

In applying such a test it is of interest to know how frequently it will lead to a rejection 
of Ho when Ho is false and the actual situation H implies that the probability law of z is 
(q+ p)*, with p ¥ 3, thereby indicating an expectation of an unequal nuniber of + and — 
differences. The probability of rejecting H» when H, implying p = 71 is true, is termed the 
power of the test of Ho relative to the alternative H, . 

A table is given for the 5% significance level (« = .05) showing the minimum value of - 
N for which the power of the test relative to p = p, exceeds @ for values of 8 from .05 to .95 
at intervals of .05; and for p; from .60 to .95 (and thereby for p: from .40 to .05) at intervals 
of .05. The case of 8 > .99 is also considered for these values of p; . 


A New Explanation of Non-Normal Dispersion. Hitpa GeErRINGEerR, Bryn 
Mawr College. 


The starting point of the Lezis theory consists in this fact: It is to be expected, on the 
average, that two expressions = and 2’ which can be computed from the results of m-n 
observations are equal, provided that the corresponding m-n chance variables z,, are 





ABSTRACTS OF PAPERS 125 


equally and independently distributed. Let a, be the average a, = 1/n > Zy» and a the av- 
y=] 


erage of the a, (u = 1,---,m). Then 


> = (Zy» = a)? 
a 


We see, however, that rows and columns do not play the same role here because = depends 
only on the a, , the average values of the rows. If the observed value of = happens to be 
larger (smaller) than the value of =’, we speak of supernormal (subnormal) dispersion. 
It is well known that supernormal dispersion can be explained by assuming that the m-n 
theoretical populations are only equal “‘by rows’’ but not by columns (there are m different 
distributions) ; in the same way one can explain the case of subnormal dispersion by admit- 
ting that the distributions are equal ‘‘by columns,”’ but not by rows. 

Another explanation which may sometimes seem more plausible is the following: All 
the m-n distributions are supposed to be equal, but we omit the assumption of mutual in- 
dependence. Then one can prove that the supernormal or subnormal dispersion corresponds 
respectively to an appropriately defined ‘‘positive’’ or ‘“‘negative correlation.’’ The fact 
that normal dispersion occurs rather rarely in social questions is then reflected by the idea 
that social phenomena are in fact not independent of each other but are usually only as- 
sumed so for the purpose of simplicity. in that way the more frequent occurrence of 
supernormal dispersion likewise finds an adequate explanation. 








—— 





